Revision as of 21:15, 27 July 2020

Urika-CS nodes

Urika-CS containers can only be used on CS-Storm (clx-ai) and CS500 (clx-21) nodes.

To be able to start singularity container, the nodes have to be prepared before your batch job starts. You can do this by a resource request ('UNS=True') at your batch submission:

qsub -l select=1:node_type=clx-21:UNS=True,walltime=00:30:00 <mybatchjob>

Module singularity/3.5.2-cs-patch has some hard-coded settings for the Urika-CS stack.

Urika-CS User Manual for installed version can be found here. Latest (will be installed soon).

The only module needed to be loaded is: bigdata/analytics - it will load all dependencies and wrappers.

Quick Start

Install cuDNN

Due to license limitations you have to install cuDNN by yourself if you accept cuDNN license.

Download ( https://developer.nvidia.com/cudnn ) NVIDIA cuDNN for CUDA 10.0, tgz version
Check License you accepted during download
Check License in the downloaded file
If you accept them, upload the archive to login node

Extract the archive

mkdir -p ~/cudnn/7.6.5.32/cuda10.0
tar -xzf ~/cudnn-10.0-linux-x64-v7.6.5.32.tgz --strip-components=1 -C ~/cudnn/7.6.5.32/cuda10.0

Create a link to your cuDNN version

ln -s ~/cudnn/7.6.5.32/cuda10.0 ~/cudnn/default

Reserve a node (interactive GPU job in this example)

qsub -I -l select=1:node_type=clx-ai:UNS=true -l walltime=01:00:00

Load environment

module load bigdata/analytics

Start interactive Urika-CS container (this will also deploy Spark)

start_analytics --cudnn-libs ~/cudnn/default/lib64/

# Activate Conda environment with Tensorflow
Singularity> source activate py37_tf_gpu

Run Urika-CS container in batch mode

run_training --no-node-list --cudnn-libs ~/cudnn/default/lib64/ -n 1 -e py37_tf_gpu "python test.py"

run_training here has following arguments

--no-node-list - do not pass node list to CMD
-n 1 - number of MPI processes, usually should correspond to the amount of reserved nodes (run_training executes: mpirun singularity ...)
-e py37_tf_gpu - this conda environment will be activated in the contai

@@ Line 43: / Line 43: @@
 == Run Urika-CS container in batch mode ==
-<pre>run_training --no-node-list --cudnn-libs ~/cudnn/default/lib64/ -n 1 -e hpy37_tf_gpu &quot;python test.py&quot;</pre>
+<pre>run_training --no-node-list --cudnn-libs ~/cudnn/default/lib64/ -n 1 -e py37_tf_gpu &quot;python test.py&quot;</pre>
 <code>run_training</code> here has following arguments
 * <code>--no-node-list</code> - do not pass node list to CMD
 * <code>-n 1</code> - number of MPI processes, usually should correspond to the amount of reserved nodes (<code>run_training</code> executes: <code>mpirun singularity ...</code>)
-* <code>-e hpy37_tf_gpu</code> - this conda environment will be activated in the contai
+* <code>-e py37_tf_gpu</code> - this conda environment will be activated in the contai

Urika CS: Difference between revisions

Revision as of 21:15, 27 July 2020

Contents

Urika-CS nodes

Quick Start

Install cuDNN

Reserve a node (interactive GPU job in this example)

Load environment

Start interactive Urika-CS container (this will also deploy Spark)

Run Urika-CS container in batch mode

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools