- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

Urika CS: Difference between revisions

From HLRS Platforms
Jump to navigationJump to search
m (typo)
(New version of Urika-CS)
Line 8: Line 8:
Module <code>singularity/3.5.2-cs-patch</code> has some hard-coded settings for the Urika-CS stack.
Module <code>singularity/3.5.2-cs-patch</code> has some hard-coded settings for the Urika-CS stack.


Urika-CS User Manual for installed version can be found [https://pubs.cray.com/bundle/CS_Series_Urika-CS_AI_and_Analytics_Applications_Guide_1_2_Nike/page/About_the_CS_Series_Urika-CS_AI_and_Analytics_Applications_Guide.html here]. [https://pubs.cray.com/bundle/CS_Series_Urika-CS_AI_and_Analytics_Applications_Guide_1_3_Nike/page/About_the_CS_Series_Urika-CS_AI_and_Analytics_Applications_Guide.html Latest] (will be installed soon).
Urika-CS User Manual for installed versions can be found at Cray website: [https://pubs.cray.com/bundle/CS_Series_Urika-CS_AI_and_Analytics_Applications_Guide_1_2_Nike/page/About_the_CS_Series_Urika-CS_AI_and_Analytics_Applications_Guide.html v1.2UP00], [https://pubs.cray.com/bundle/CS_Series_Urika-CS_AI_and_Analytics_Applications_Guide_1_3_Nike/page/About_the_CS_Series_Urika-CS_AI_and_Analytics_Applications_Guide.html v1.3UP00]. Default version is currently 1.4UP00.


The only module needed to be loaded is: <code>bigdata/analytics</code> - it will load all dependencies and wrappers.
The only module needed to be loaded is: <code>bigdata/analytics</code> - it will load all dependencies and wrappers.
Line 24: Line 24:
<li>If you accept them, upload the archive to login node</li>
<li>If you accept them, upload the archive to login node</li>
<li><p>Extract the archive</p>
<li><p>Extract the archive</p>
<pre>mkdir -p ~/cudnn/7.6.5.32/cuda10.0
<source lang="bash">mkdir -p ~/cudnn/7.6.5.32/cuda10.0
tar -xzf ~/cudnn-10.0-linux-x64-v7.6.5.32.tgz --strip-components=1 -C ~/cudnn/7.6.5.32/cuda10.0</pre></li>
tar -xzf ~/cudnn-10.0-linux-x64-v7.6.5.32.tgz --strip-components=1 -C ~/cudnn/7.6.5.32/cuda10.0</source></li>
<li><p>Create a link to your cuDNN version</p>
<li><p>Create a link to your cuDNN version</p>
<pre>ln -s ~/cudnn/7.6.5.32/cuda10.0 ~/cudnn/default</pre></li></ol>
<source lang="bash">ln -s ~/cudnn/7.6.5.32/cuda10.0 ~/cudnn/default</source></li></ol>


== Reserve a node (interactive GPU job in this example) ==
== Reserve a node (interactive GPU job in this example) ==


<pre>qsub -I -l select=1:node_type=clx-ai:UNS=true -l walltime=01:00:00</pre>
<source lang="bash">qsub -I -l select=1:node_type=clx-ai:UNS=true -l walltime=01:00:00</source>
== Load environment ==
== Load environment ==


<pre>module load bigdata/analytics</pre>
<source lang="bash">module load bigdata/analytics</source>
== Start interactive Urika-CS container (this will also deploy Spark) ==
== Start interactive Urika-CS container (this will also deploy Spark) ==


<pre>start_analytics --cudnn-libs ~/cudnn/default/lib64/
<source lang="bash">start_analytics --cudnn-libs ~/cudnn/default/lib64/
 
# Activate Conda environment with TensorFlow
Singularity> source activate py37_tf_gpu
 
# In the latest Urika-CS image, TensorFlow is installed outside of Conda environment.
# To use TensorFlow set up PYTHONPATH with a path for corresponding version
# Use one of the commands below
## either
export PYTHONPATH=/opt/tensorflow_cpu:$PYTHONPATH
## or
export PYTHONPATH=/opt/tensorflow_gpu:$PYTHONPATH
## or
export PYTHONPATH=/opt/tensorflow_2.1.1/tensorflow_cpu:$PYTHONPATH
## or
export PYTHONPATH=/opt/tensorflow_2.1.1/tensorflow_gpu:$PYTHONPATH
</source>


# Activate Conda environment with Tensorflow
Singularity&gt; source activate py37_tf_gpu</pre>
== Run Urika-CS container in batch mode ==
== Run Urika-CS container in batch mode ==


<pre>run_training --no-node-list --cudnn-libs ~/cudnn/default/lib64/ -n 1 -e py37_tf_gpu &quot;python test.py&quot;</pre>
<source lang="bash">run_training --no-node-list --cudnn-libs ~/cudnn/default/lib64/ -n 1 -e py37_tf_gpu --tensorflow-version 2.1.1 "python test.py"</source>
<code>run_training</code> here has following arguments
<code>run_training</code> here has following arguments


* <code>--no-node-list</code> - do not pass node list to CMD
* <code>--no-node-list</code> - do not pass node list to CMD
* <code>-n 1</code> - number of MPI processes, usually should correspond to the amount of reserved nodes (<code>run_training</code> executes: <code>mpirun singularity ...</code>)
* <code>-n 1</code> - number of MPI processes, usually should correspond to the amount of reserved nodes (<code>run_training</code> executes: <code>mpirun singularity ...</code>)
* <code>-e py37_tf_gpu</code> - this conda environment will be activated in the contai
* <code>-e py37_tf_gpu</code> - this conda environment will be activated in the container
* <code>--tensorflow-version 2.1.1</code> - Specify TensorFlow version (optional). Default is version 1.x (e.g. <code>1.15.2</code>)

Revision as of 21:10, 3 December 2020

Urika-CS nodes

Urika-CS containers can only be used on CS-Storm (clx-ai) and CS500 (clx-21) nodes.

To be able to start singularity container, the nodes have to be prepared before your batch job starts. You can do this by a resource request ('UNS=True') at your batch submission:

qsub -l select=1:node_type=clx-21:UNS=True,walltime=00:30:00 <mybatchjob>

Module singularity/3.5.2-cs-patch has some hard-coded settings for the Urika-CS stack.

Urika-CS User Manual for installed versions can be found at Cray website: v1.2UP00, v1.3UP00. Default version is currently 1.4UP00.

The only module needed to be loaded is: bigdata/analytics - it will load all dependencies and wrappers.

Quick Start

Install cuDNN

Due to license limitations you have to install cuDNN by yourself if you accept cuDNN license.

  1. Download ( https://developer.nvidia.com/cudnn ) NVIDIA cuDNN for CUDA 10.0, tgz version
  2. Check License you accepted during download
  3. Check License in the downloaded file
  4. If you accept them, upload the archive to login node
  5. Extract the archive

    mkdir -p ~/cudnn/7.6.5.32/cuda10.0
    tar -xzf ~/cudnn-10.0-linux-x64-v7.6.5.32.tgz --strip-components=1 -C ~/cudnn/7.6.5.32/cuda10.0
    
  6. Create a link to your cuDNN version

    ln -s ~/cudnn/7.6.5.32/cuda10.0 ~/cudnn/default
    

Reserve a node (interactive GPU job in this example)

qsub -I -l select=1:node_type=clx-ai:UNS=true -l walltime=01:00:00

Load environment

module load bigdata/analytics

Start interactive Urika-CS container (this will also deploy Spark)

start_analytics --cudnn-libs ~/cudnn/default/lib64/

# Activate Conda environment with TensorFlow
Singularity> source activate py37_tf_gpu

# In the latest Urika-CS image, TensorFlow is installed outside of Conda environment.
# To use TensorFlow set up PYTHONPATH with a path for corresponding version
# Use one of the commands below
## either
export PYTHONPATH=/opt/tensorflow_cpu:$PYTHONPATH
## or
export PYTHONPATH=/opt/tensorflow_gpu:$PYTHONPATH
## or
export PYTHONPATH=/opt/tensorflow_2.1.1/tensorflow_cpu:$PYTHONPATH
## or
export PYTHONPATH=/opt/tensorflow_2.1.1/tensorflow_gpu:$PYTHONPATH

Run Urika-CS container in batch mode

run_training --no-node-list --cudnn-libs ~/cudnn/default/lib64/ -n 1 -e py37_tf_gpu --tensorflow-version 2.1.1 "python test.py"

run_training here has following arguments

  • --no-node-list - do not pass node list to CMD
  • -n 1 - number of MPI processes, usually should correspond to the amount of reserved nodes (run_training executes: mpirun singularity ...)
  • -e py37_tf_gpu - this conda environment will be activated in the container
  • --tensorflow-version 2.1.1 - Specify TensorFlow version (optional). Default is version 1.x (e.g. 1.15.2)