- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -
Urika CS: Difference between revisions
(Urika CS. Initial commit) |
(upd: new conda env name in version 1.4UP00) |
||
(2 intermediate revisions by the same user not shown) | |||
Line 8: | Line 8: | ||
Module <code>singularity/3.5.2-cs-patch</code> has some hard-coded settings for the Urika-CS stack. | Module <code>singularity/3.5.2-cs-patch</code> has some hard-coded settings for the Urika-CS stack. | ||
Urika-CS User Manual for installed | Urika-CS User Manual for installed versions can be found at Cray website: [https://pubs.cray.com/bundle/CS_Series_Urika-CS_AI_and_Analytics_Applications_Guide_1_2_Nike/page/About_the_CS_Series_Urika-CS_AI_and_Analytics_Applications_Guide.html v1.2UP00], [https://pubs.cray.com/bundle/CS_Series_Urika-CS_AI_and_Analytics_Applications_Guide_1_3_Nike/page/About_the_CS_Series_Urika-CS_AI_and_Analytics_Applications_Guide.html v1.3UP00]. Default version is currently 1.4UP00. | ||
The only module needed to be loaded is: <code>bigdata/analytics</code> - it will load all dependencies and wrappers. | The only module needed to be loaded is: <code>bigdata/analytics</code> - it will load all dependencies and wrappers. | ||
Line 24: | Line 24: | ||
<li>If you accept them, upload the archive to login node</li> | <li>If you accept them, upload the archive to login node</li> | ||
<li><p>Extract the archive</p> | <li><p>Extract the archive</p> | ||
< | <source lang="bash">mkdir -p ~/cudnn/7.6.5.32/cuda10.0 | ||
tar -xzf ~/cudnn-10.0-linux-x64-v7.6.5.32.tgz --strip-components=1 -C ~/cudnn/7.6.5.32/cuda10.0</ | tar -xzf ~/cudnn-10.0-linux-x64-v7.6.5.32.tgz --strip-components=1 -C ~/cudnn/7.6.5.32/cuda10.0</source></li> | ||
<li><p>Create a link to your cuDNN version</p> | <li><p>Create a link to your cuDNN version</p> | ||
< | <source lang="bash">ln -s ~/cudnn/7.6.5.32/cuda10.0 ~/cudnn/default</source></li></ol> | ||
== Reserve a node (interactive GPU job in this example) == | == Reserve a node (interactive GPU job in this example) == | ||
< | <source lang="bash">qsub -I -l select=1:node_type=clx-ai:UNS=true -l walltime=01:00:00</source> | ||
== Load environment == | == Load environment == | ||
< | <source lang="bash">module load bigdata/analytics</source> | ||
== Start interactive Urika-CS container (this will also deploy Spark) == | == Start interactive Urika-CS container (this will also deploy Spark) == | ||
< | <source lang="bash">start_analytics --cudnn-libs ~/cudnn/default/lib64/ | ||
# Activate Conda environment with TensorFlow (GPU) | |||
Singularity> source activate py37_tf2.1.1_gpu | |||
# In the latest Urika-CS image, TensorFlow is installed outside of Conda environment. | |||
# To use TensorFlow set up PYTHONPATH with a path for corresponding version | |||
# Use one of the commands below | |||
## either | |||
export PYTHONPATH=/opt/tensorflow_cpu:$PYTHONPATH | |||
## or | |||
export PYTHONPATH=/opt/tensorflow_gpu:$PYTHONPATH | |||
## or | |||
export PYTHONPATH=/opt/tensorflow_2.1.1/tensorflow_cpu:$PYTHONPATH | |||
## or | |||
export PYTHONPATH=/opt/tensorflow_2.1.1/tensorflow_gpu:$PYTHONPATH | |||
</source> | |||
== Run Urika-CS container in batch mode == | == Run Urika-CS container in batch mode == | ||
< | <source lang="bash">run_training --no-node-list --cudnn-libs ~/cudnn/default/lib64/ -n 1 -e py37_tf_gpu --tensorflow-version 2.1.1 "python test.py"</source> | ||
<code>run_training</code> here has following arguments | <code>run_training</code> here has following arguments | ||
* <code>--no-node-list</code> - do not pass node list to CMD | * <code>--no-node-list</code> - do not pass node list to CMD | ||
* <code>-n 1</code> - number of MPI processes, usually should correspond to the amount of reserved nodes (<code>run_training</code> executes: <code>mpirun singularity ...</code>) | * <code>-n 1</code> - number of MPI processes, usually should correspond to the amount of reserved nodes (<code>run_training</code> executes: <code>mpirun singularity ...</code>) | ||
* <code>-e | * <code>-e py37_tf_gpu</code> - this conda environment will be activated in the container | ||
* <code>--tensorflow-version 2.1.1</code> - Specify TensorFlow version (optional). Default is version 1.x (e.g. <code>1.15.2</code>) |
Latest revision as of 18:57, 7 December 2020
Urika-CS nodes
Urika-CS containers can only be used on CS-Storm (clx-ai
) and CS500 (clx-21
) nodes.
To be able to start singularity container, the nodes have to be prepared before your batch job starts. You can do this by a resource request ('UNS=True') at your batch submission:
qsub -l select=1:node_type=clx-21:UNS=True,walltime=00:30:00 <mybatchjob>
Module singularity/3.5.2-cs-patch
has some hard-coded settings for the Urika-CS stack.
Urika-CS User Manual for installed versions can be found at Cray website: v1.2UP00, v1.3UP00. Default version is currently 1.4UP00.
The only module needed to be loaded is: bigdata/analytics
- it will load all dependencies and wrappers.
Quick Start
Install cuDNN
Due to license limitations you have to install cuDNN by yourself if you accept cuDNN license.
- Download ( https://developer.nvidia.com/cudnn ) NVIDIA cuDNN for CUDA 10.0, tgz version
- Check License you accepted during download
- Check License in the downloaded file
- If you accept them, upload the archive to login node
Extract the archive
mkdir -p ~/cudnn/7.6.5.32/cuda10.0 tar -xzf ~/cudnn-10.0-linux-x64-v7.6.5.32.tgz --strip-components=1 -C ~/cudnn/7.6.5.32/cuda10.0
Create a link to your cuDNN version
ln -s ~/cudnn/7.6.5.32/cuda10.0 ~/cudnn/default
Reserve a node (interactive GPU job in this example)
qsub -I -l select=1:node_type=clx-ai:UNS=true -l walltime=01:00:00
Load environment
module load bigdata/analytics
Start interactive Urika-CS container (this will also deploy Spark)
start_analytics --cudnn-libs ~/cudnn/default/lib64/
# Activate Conda environment with TensorFlow (GPU)
Singularity> source activate py37_tf2.1.1_gpu
# In the latest Urika-CS image, TensorFlow is installed outside of Conda environment.
# To use TensorFlow set up PYTHONPATH with a path for corresponding version
# Use one of the commands below
## either
export PYTHONPATH=/opt/tensorflow_cpu:$PYTHONPATH
## or
export PYTHONPATH=/opt/tensorflow_gpu:$PYTHONPATH
## or
export PYTHONPATH=/opt/tensorflow_2.1.1/tensorflow_cpu:$PYTHONPATH
## or
export PYTHONPATH=/opt/tensorflow_2.1.1/tensorflow_gpu:$PYTHONPATH
Run Urika-CS container in batch mode
run_training --no-node-list --cudnn-libs ~/cudnn/default/lib64/ -n 1 -e py37_tf_gpu --tensorflow-version 2.1.1 "python test.py"
run_training
here has following arguments
--no-node-list
- do not pass node list to CMD-n 1
- number of MPI processes, usually should correspond to the amount of reserved nodes (run_training
executes:mpirun singularity ...
)-e py37_tf_gpu
- this conda environment will be activated in the container--tensorflow-version 2.1.1
- Specify TensorFlow version (optional). Default is version 1.x (e.g.1.15.2
)