- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -
Urika CS
Urika-CS nodes
Urika-CS containers can only be used on CS-Storm (clx-ai
) and CS500 (clx-21
) nodes.
To be able to start singularity container, the nodes have to be prepared before your batch job starts. You can do this by a resource request ('UNS=True') at your batch submission:
qsub -l select=1:node_type=clx-21:UNS=True,walltime=00:30:00 <mybatchjob>
Module singularity/3.5.2-cs-patch
has some hard-coded settings for the Urika-CS stack.
Urika-CS User Manual for installed versions can be found at Cray website: v1.2UP00, v1.3UP00. Default version is currently 1.4UP00.
The only module needed to be loaded is: bigdata/analytics
- it will load all dependencies and wrappers.
Quick Start
Install cuDNN
Due to license limitations you have to install cuDNN by yourself if you accept cuDNN license.
- Download ( https://developer.nvidia.com/cudnn ) NVIDIA cuDNN for CUDA 10.0, tgz version
- Check License you accepted during download
- Check License in the downloaded file
- If you accept them, upload the archive to login node
Extract the archive
mkdir -p ~/cudnn/7.6.5.32/cuda10.0 tar -xzf ~/cudnn-10.0-linux-x64-v7.6.5.32.tgz --strip-components=1 -C ~/cudnn/7.6.5.32/cuda10.0
Create a link to your cuDNN version
ln -s ~/cudnn/7.6.5.32/cuda10.0 ~/cudnn/default
Reserve a node (interactive GPU job in this example)
qsub -I -l select=1:node_type=clx-ai:UNS=true -l walltime=01:00:00
Load environment
module load bigdata/analytics
Start interactive Urika-CS container (this will also deploy Spark)
start_analytics --cudnn-libs ~/cudnn/default/lib64/
# Activate Conda environment with TensorFlow (GPU)
Singularity> source activate py37_tf2.1.1_gpu
# In the latest Urika-CS image, TensorFlow is installed outside of Conda environment.
# To use TensorFlow set up PYTHONPATH with a path for corresponding version
# Use one of the commands below
## either
export PYTHONPATH=/opt/tensorflow_cpu:$PYTHONPATH
## or
export PYTHONPATH=/opt/tensorflow_gpu:$PYTHONPATH
## or
export PYTHONPATH=/opt/tensorflow_2.1.1/tensorflow_cpu:$PYTHONPATH
## or
export PYTHONPATH=/opt/tensorflow_2.1.1/tensorflow_gpu:$PYTHONPATH
Run Urika-CS container in batch mode
run_training --no-node-list --cudnn-libs ~/cudnn/default/lib64/ -n 1 -e py37_tf_gpu --tensorflow-version 2.1.1 "python test.py"
run_training
here has following arguments
--no-node-list
- do not pass node list to CMD-n 1
- number of MPI processes, usually should correspond to the amount of reserved nodes (run_training
executes:mpirun singularity ...
)-e py37_tf_gpu
- this conda environment will be activated in the container--tensorflow-version 2.1.1
- Specify TensorFlow version (optional). Default is version 1.x (e.g.1.15.2
)