- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -
Urika CS
Urika-CS nodes
Urika-CS containers can only be used on CS-Storm (clx-ai
) and CS500 (clx-21
) nodes.
To be able to start singularity container, the nodes have to be prepared before your batch job starts. You can do this by a resource request ('UNS=True') at your batch submission:
qsub -l select=1:node_type=clx-21:UNS=True,walltime=00:30:00 <mybatchjob>
Module singularity/3.5.2-cs-patch
has some hard-coded settings for the Urika-CS stack.
Urika-CS User Manual for installed version can be found here. Latest (will be installed soon).
The only module needed to be loaded is: bigdata/analytics
- it will load all dependencies and wrappers.
Quick Start
Install cuDNN
Due to license limitations you have to install cuDNN by yourself if you accept cuDNN license.
- Download ( https://developer.nvidia.com/cudnn ) NVIDIA cuDNN for CUDA 10.0, tgz version
- Check License you accepted during download
- Check License in the downloaded file
- If you accept them, upload the archive to login node
Extract the archive
mkdir -p ~/cudnn/7.6.5.32/cuda10.0 tar -xzf ~/cudnn-10.0-linux-x64-v7.6.5.32.tgz --strip-components=1 -C ~/cudnn/7.6.5.32/cuda10.0
Create a link to your cuDNN version
ln -s ~/cudnn/7.6.5.32/cuda10.0 ~/cudnn/default
Reserve a node (interactive GPU job in this example)
qsub -I -l select=1:node_type=clx-ai:UNS=true -l walltime=01:00:00
Load environment
module load bigdata/analytics
Start interactive Urika-CS container (this will also deploy Spark)
start_analytics --cudnn-libs ~/cudnn/default/lib64/ # Activate Conda environment with Tensorflow Singularity> source activate py37_tf_gpu
Run Urika-CS container in batch mode
run_training --no-node-list --cudnn-libs ~/cudnn/default/lib64/ -n 1 -e hpy37_tf_gpu "python test.py"
run_training
here has following arguments
--no-node-list
- do not pass node list to CMD-n 1
- number of MPI processes, usually should correspond to the amount of reserved nodes (run_training
executes:mpirun singularity ...
)-e hpy37_tf_gpu
- this conda environment will be activated in the contai