- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -
How to use Conda environments on the clusters
This guide shows you how to move conda environments to move the local environment on the clusters without internet access.
Assumptions:
- You already have a virtual environment called my_env and you want to move this environment.
- The environment can have packages installed with conda and pip.
Warning: Conda/pip downloads and installs precompiled binaries suitable to the architecture and OS of the local environment and might compile from source when necessary for the local architecture. These packages will run differently for the target system.
Using conda-pack
Install conda-pack
in the base or root environment:
(my_env) $ conda deactivate
(base) $ conda install -c conda-forge conda-pack
Package the environment and transfer the archive to the clusters (e.g., scp):
(base) $ conda pack -n my_env -o my_env.tar.gz
Transfer files to a workspace
A worskpace can be used to refer to the compressed Conda environment. You can send your data to an existing workspace using:
scp <file> <destination_host>:<destination_directory>
The <destination_host>
can also be replaced with a pre-configured SSH host.
Work interactively on a single node
A large number of files decreases the performance of the parallel file system. You can use the ram disk instead:
qsub -I -l select=1:node_type=clx-25 -l walltime=00:30:00 # modify this line to work on Hawk, or to select different resources
export ENV_PATH=/run/user/$PBS_JOBID/my_env # We use the ram disk to extract the environment packages since a large number of files decreases the performance of the parallel file system.
export ENV_ARCHIVE=/path/to/my_env.tar.gz # Adjust the path.
mkdir -p $ENV_PATH
tar -xzf $ENV_ARCHIVE -C $ENV_PATH # This line extracts the packages to RAM disk.
source $ENV_PATH/bin/activate
conda-unpack
# Use the environment here.
rm -rf $ENV_PATH # It's nice to clean up before you terminate the job.
Notes
- Extracting the environment directly into a Workspace causes reduction in performance, as compared to the RAM disk.
- This guide assumes there is a
$ENV_PATH/bin
folder when unpacking the Conda environment. - This was tested on Conda environments created on Linux, using Python version 3.7.9.
Use the environment on multiple nodes
Preparing a batch script to launch a multi-node job would be best. The steps to start a distributed Python application on multiple nodes depend on the third-party library (e.g., Dask, Ray). Independent of the third-party library, if you want to continue using the ram disk to unpack the virtual environment, you must extract the archive on each node separately. Our documentation provides scripts to launch a Ray cluster using conda virtual environments.