- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -
Julia: Difference between revisions
m (Removed reference to Vulcan and former HLRS staff) |
|||
(39 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
[https://julialang.org Julia] is a high-productivity, high-performance programming language. It is especially well-suited for numerical computations and scientific computing, with performance on par to traditional statically-compiled languages. At HLRS, the use of Julia is supported on [[HPE_Hawk|Hawk | [[File:Julia logo.png|thumb|right|The Julia programming language]] | ||
[https://julialang.org Julia] is a high-productivity, high-performance programming language. It is especially well-suited for numerical computations and scientific computing, with performance on par to traditional statically-compiled languages. At HLRS, the use of Julia is currently supported on [[HPE_Hawk|Hawk]] via the <code>julia</code> module. | |||
== | == Getting started == | ||
=== Create SSH SOCKS proxy to install packages === | |||
The compute systems at HLRS do not allow internet access from the login nodes. This prevents the out-of-the-box use of Julia’s package manager [https://pkgdocs.julialang.org/ Pkg.jl], which handles the installation of packages and their dependencies. To enable the use of Pkg.jl again, you need to enable ''reverse dynamic forwarding'' when you log in via [[Secure_Shell_ssh|SSH]]. This allows Julia to access the internet via your own SSH session. The following instructions (based on [https://discourse.julialang.org/t/installing-packages-via-an-ssh-socks-proxy-on-a-compute-cluster/71735 this post]) work on Unix-like systems with a ''fairly'' recent version of OpenSSH (>= 7.6, i.e., post-2017). | The compute systems at HLRS do not allow internet access from the login nodes. This prevents the out-of-the-box use of Julia’s package manager [https://pkgdocs.julialang.org/ Pkg.jl], which handles the installation of packages and their dependencies. To enable the use of Pkg.jl again, you need to enable ''reverse dynamic forwarding'' when you log in via [[Secure_Shell_ssh|SSH]]. This allows Julia to access the internet via your own SSH session. The following instructions (based on [https://discourse.julialang.org/t/installing-packages-via-an-ssh-socks-proxy-on-a-compute-cluster/71735 this post]) work on Unix-like systems with a ''fairly'' recent version of OpenSSH (>= 7.6, i.e., post-2017). | ||
Line 18: | Line 17: | ||
export https_proxy=socks5://localhost:SOCKS_PORT | export https_proxy=socks5://localhost:SOCKS_PORT | ||
export http_proxy=socks5://localhost:SOCKS_PORT | export http_proxy=socks5://localhost:SOCKS_PORT | ||
export JULIA_PKG_USE_CLI_GIT=true # optional | |||
</syntaxhighlight> | </syntaxhighlight> | ||
Now, all regular package operations should work as usual. | Now, all regular package operations should work as usual. ''Note: Setting <code>JULIA_PKG_USE_CLI_GIT</code> is optional, as in many cases Pkg.jl will work without it - but it also does not hurt.'' | ||
To make these changes permanent such that you do not have to execute them manually each time, add the lines above to your startup shell file, e.g., <code>~/.bash_profile</code> on Hawk. Furthermore, add the following entry to your ''local'' <code>~/.ssh/config</code> file: | |||
<syntaxhighlight lang="ssh"> | <syntaxhighlight lang="ssh"> | ||
Host hawk.hww.hlrs.de | Host hawk.hww.hlrs.de | ||
RemoteForward SOCKS_PORT | RemoteForward SOCKS_PORT | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Note that if someone already uses the hardcoded port (even you yourself in a different SSH session), you need to override it by providing the respective SSH arguments on the command line again. | Note that if someone already uses the hardcoded port (even you yourself in a different SSH session), you need to override it by providing the respective SSH arguments on the command line again. | ||
Line 37: | Line 30: | ||
A similar setup for setting up a dynamic reverse proxy also works on Windows when using PuTTY. Instructions (untested) can be found, e.g., [https://www.math.ucla.edu/computing/kb/creating-ssh-proxy-tunnel-putty here] or [https://www.forwardproxy.com/2018/12/using-putty-to-setup-a-quick-socks-proxy/ here] (ignore the part about configuring the proxy in Firefox). | A similar setup for setting up a dynamic reverse proxy also works on Windows when using PuTTY. Instructions (untested) can be found, e.g., [https://www.math.ucla.edu/computing/kb/creating-ssh-proxy-tunnel-putty here] or [https://www.forwardproxy.com/2018/12/using-putty-to-setup-a-quick-socks-proxy/ here] (ignore the part about configuring the proxy in Firefox). | ||
=== Load the Julia module === | |||
To start using Julia on a login node or on one of the compute nodes, load the <code>julia</code> module by executing | |||
<syntaxhighlight lang="shell"> | |||
module load julia | |||
</syntaxhighlight> | |||
Alternatively, if you are on a AI node and would like to use Julia with CUDA support, load the <code>julia/cuda</code> module by running | |||
<syntaxhighlight lang="shell"> | |||
module load julia/cuda | |||
</syntaxhighlight> | |||
Note that CUDA-aware MPI is currently only supported when using the MPT MPI implementation (which is the current default on Hawk). | |||
Note that the Julia module automatically sets your Julia depot path to <code>$HOME/.julia/$SITE_NAME/$SITE_PLATFORM_NAME</code>, where <code>SITE_NAME</code> is <code>HLRS</code> and <code>SITE_PLATFORM_NAME</code> is one of <code>hawk</code>, <code>vulcan</code>, or <code>training</code>. This maintains a separate Julia depot for each system, which makes sense considering that they have different hardware. | |||
=== Install MPI.jl and CUDA.jl === | |||
To install MPI.jl, execute | To install MPI.jl, execute | ||
<syntaxhighlight lang="shell"> | <syntaxhighlight lang="shell"> | ||
julia -e 'using Pkg; Pkg.add("MPI")' | julia -e 'using Pkg; Pkg.add("MPI")' | ||
</syntaxhighlight> | </syntaxhighlight> | ||
This will download and precompile MPI.jl and all of its dependencies. If you find that the installation produce is stuck at a very early stage (e.g., after outputting only <code>Updating registry at `~/.julia/registries/General.toml`</code>), it means you have not properly set up your SOCKS proxy or forgot to add the appropriate environment variables. | on a login node. This will download and precompile MPI.jl and all of its dependencies. If you find that the installation produce is stuck at a very early stage (e.g., after outputting only <code>Updating registry at `~/.julia/registries/General.toml`</code>), it means you have not properly set up your SOCKS proxy or forgot to add the appropriate environment variables. | ||
You can check MPI.jl was properly configured by executing | You can check MPI.jl was properly configured by executing | ||
Line 54: | Line 56: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
This should give you an output similar to | This should give you an output similar to | ||
<syntaxhighlight lang="shell"> | |||
(MPI.UnknownMPI, v"0.0.0") | |||
</syntaxhighlight> | |||
if you use the default MPT MPI backend, and | |||
<syntaxhighlight lang="shell"> | <syntaxhighlight lang="shell"> | ||
(MPI.OpenMPI, v"4.0.5") | (MPI.OpenMPI, v"4.0.5") | ||
</syntaxhighlight> | </syntaxhighlight> | ||
if you use OpenMPI. | |||
If you also want to use the GPUs with Julia, install CUDA.jl by executing | If you also want to use the GPUs with Julia, install CUDA.jl by executing | ||
Line 62: | Line 69: | ||
julia -e 'using Pkg; Pkg.add("CUDA")' | julia -e 'using Pkg; Pkg.add("CUDA")' | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Note that you should not attempt to use or test CUDA.jl on the login nodes, since CUDA is not available here (they do not have GPUs) and thus anything CUDA-related will fail. | on a login node. Note that you should not attempt to use or test CUDA.jl on the login nodes, since CUDA is not available here (they do not have GPUs) and thus anything CUDA-related will fail. | ||
=== Verify that MPI works === | |||
Start an interactive session on a compute node by executing | Start an interactive session on a compute node by executing | ||
<syntaxhighlight lang="shell"> | <syntaxhighlight lang="shell"> | ||
qsub -I -l select=1:node_type=rome:ncpus=128:mpiprocs=128 -l walltime=00:20:00 | qsub -I -l select=1:node_type=rome:ncpus=128:mpiprocs=128 -l walltime=00:20:00 | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Once your interactive job has been allocated, run a simple test program from the shell by executing | Once your interactive job has been allocated, load the Julia module with <code>module load julia</code>. Then, run a simple test program from the shell by executing | ||
<syntaxhighlight lang="shell"> | <syntaxhighlight lang="shell"> | ||
mpirun -np 5 julia mpi_test.jl | mpirun -np 5 julia mpi_test.jl | ||
</syntaxhighlight> | </syntaxhighlight> | ||
The code for | The code for <code>mpi_test.jl</code> (based on code found [https://github.com/JuliaParallel/MPI.jl/blob/890ee6e69ed902af2d7db112b601ef3b2744400b/docs/examples/04-sendrecv.jl here] and [https://uni-paderborn.atlassian.net/wiki/spaces/PC2DOK/pages/15728664/Using+NVIDIA+GPUs+with+Julia#CUDA-aware-OpenMPI here]) is as follows: | ||
<syntaxhighlight lang="julia"> | <syntaxhighlight lang="julia"> | ||
# mpi_test.jl | # mpi_test.jl | ||
Line 111: | Line 118: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
=== Verify that CUDA works === | |||
To test CUDA, you need to leave your interactive session on a CPU node and get an interactive session on a GPU node by running | To test CUDA, you need to leave your interactive session on a CPU node and get an interactive session on a GPU node by running | ||
<syntaxhighlight lang="shell"> | <syntaxhighlight lang="shell"> | ||
Line 118: | Line 124: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
The first test is to check whether CUDA.jl can find all relevant drivers and GPUs. | The first test is to check whether CUDA.jl can find all relevant drivers and GPUs. Load Julia with CUDA support by executing <code>module load julia/cuda</code>. Then, start the Julia REPL by running <code>julia</code> and execute | ||
<syntaxhighlight lang="julia"> | <syntaxhighlight lang="julia"> | ||
julia> using CUDA | julia> using CUDA | ||
julia> CUDA.versioninfo() | julia> CUDA.versioninfo() | ||
</syntaxhighlight> | |||
to see an output similar to this: | |||
<syntaxhighlight lang="julia"> | |||
CUDA toolkit 11.4, local installation | CUDA toolkit 11.4, local installation | ||
NVIDIA driver 470.57.2, for CUDA 11.4 | NVIDIA driver 470.57.2, for CUDA 11.4 | ||
Line 161: | Line 169: | ||
Next, we will test if computing on the GPU is actually faster than on the CPU, to ensure that actual computations work. | Next, we will test if computing on the GPU is actually faster than on the CPU, to ensure that actual computations work. | ||
For this, paste the following snippet in the Julia REPL. Approximate timings are included as reference for you: | For this, paste the following snippet (based on code found [https://uni-paderborn.atlassian.net/wiki/spaces/PC2DOK/pages/15728664/Using+NVIDIA+GPUs+with+Julia#CUDA-aware-OpenMPI here]) in the Julia REPL. Approximate timings are included as reference for you: | ||
<syntaxhighlight lang="julia"> | <syntaxhighlight lang="julia"> | ||
using CUDA | |||
A = rand(2000, 2000); | A = rand(2000, 2000); | ||
B = rand(2000, 2000); | B = rand(2000, 2000); | ||
Line 175: | Line 185: | ||
As you can see, the matrix-matrix multiplication on the GPU is much faster than on the CPU. | As you can see, the matrix-matrix multiplication on the GPU is much faster than on the CPU. | ||
''Note: At the moment, CUDA-aware MPI does not work with Julia on Hawk. Support for it is currently being worked on.'' | |||
=== Where to go from here? === | |||
With the above steps completed, you are ready to run Julia-based programs in parallel on Hawk! With the initial setup completed, in future sessions you only need to load the appropriate Julia module with <code>module load julia</code> or <code>module load julia/cuda</code> and everything else should be set up for you. | |||
''Note: There have been reports that sometimes the MPT MPI implementation causes trouble when interacting with other MPI-aware libraries from within Julia. In this case, try the OpenMPI implementation by loading it with <code>module load openmpi</code>. After switching the MPI backend, you also need to reconfigure MPI.jl by executing <code>julia -e 'using Pkg; Pkg.build("MPI")'</code>.'' | |||
== Using IJulia/Jupyter == | |||
[https://github.com/JuliaLang/IJulia.jl IJulia.jl] is a Julia backend for the [http://jupyter.org/ Jupyter] interactive computing framework. It allows you to use Julia as a kernel in Jupyter notebooks, which combine code, documentation, and media such as images and videos in a single document. | |||
== | === Installation === | ||
Log in to Hawk and start the Julia REPL on a login node. Then, obtain IJulia by executing | |||
<syntaxhighlight lang="julia"> | |||
using Pkg | |||
Pkg.add("IJulia") | |||
</syntaxhighlight> | |||
In addition to installing the IJulia package, this will put a kernel specification for Julia into <code>$HOME/.local/share/jupyter/kernels</code>. | |||
= | If you already have a working Jupyter installation on Hawk, you are finished with the installation process. If not, you can also have IJulia get its own Jupyter installation by executing | ||
<syntaxhighlight lang="julia"> | |||
using IJulia | |||
notebook() | |||
</syntaxhighlight> | |||
The first time you run <code>notebook()</code> it will ask you if it should install Jupyter for you. If you confirm, this will download and install everything required for running Jupyter inside your Julia depot path (see the <code>JULIA_DEPOT_PATH</code> environment variable). | |||
In addition to Jupyter, you also might want to get the newer JupyterLab web interface. You can install a Julia-managed version of JupyterLab similar to Jupyter itself by executing | |||
<syntaxhighlight lang="julia"> | |||
using IJulia | |||
jupyterlab() | |||
</syntaxhighlight> | |||
and confirming the installation when prompted. | |||
=== Getting started === | |||
< | To run Jupyter with IJulia on Hawk/Vulcan, you need to use some [[Secure Shell ssh|SSH]] magic to facilitate the exchange of data between your local browser and the Jupyter/JupyterLab instance running on an HLRS system (for simplicity, we will only refer to JupyterLab from now on, although everything stated also applies to plain Jupyter). Before beginning, you need to pick a port number you want JupyterLab to listen to for incoming requests. It must be in the range 1024-65535 and needs to be unused. In this documentation, we will use <code>18888</code>. | ||
==== Login with local port forwarding ==== | |||
First, log in to Hawk via SSH and enable port forwarding for <code>18888</code> to the Hawk login node: | |||
<syntaxhighlight lang="julia"> | |||
ssh -L 18888:localhost:18888 hawk.hww.hlrs.de | |||
</syntaxhighlight> | |||
This will forward requests to your local port <code>18888</code> to the same port number on the Hawk login node. | |||
# | The next step depends on what kind of computing you would like to perform: If you want to run JupyterLab on a compute node, proceed with the [[#Create interactive job with remote port forwarding|next section]]. Otherwise, you can skip that step and proceed with [[#Start JupyterLab and open in browser|Start JupyterLab and open in browser]]. | ||
==== Create interactive job with remote port forwarding ==== | |||
Start an interactive job on one compute node, e.g., by executing | |||
<syntaxhighlight lang="bash"> | |||
qsub -I -l select=1:node_type=rome:ncpus=128:mpiprocs=128 -l walltime=01:00:00 | |||
</syntaxhighlight> | </syntaxhighlight> | ||
and wait until you have been allocated the requested resources. | |||
Next, you need to make sure that a connection from your local browser can be forwarded to a JupyterLab instance running on the compute node. This is achieved by starting SSH with remote port forwarding to create a tunnel from the login node to the compute node. For this you need to note the ''exact'' login node hostname to which you connected in the first step, e.g., <code>hawk-login03</code>, and then use this to create the reverse tunnel: | |||
<syntaxhighlight lang="bash"> | |||
ssh -R 18888:localhost:18888 hawk-login03 -N -f | |||
</syntaxhighlight> | |||
The <code>-N -f</code> options will make the SSH session go immediately in the background, yielding control back to your interactive job terminal. | |||
==== Start JupyterLab and open in browser ==== | |||
Now it is time to start JupyterLab. First, we need to determine the path to the <code>jupyter</code> executable. The binary directory can be found by executing | |||
<syntaxhighlight lang="bash"> | |||
julia -e 'using Conda; println(Conda.SCRIPTDIR)' | |||
</syntaxhighlight> | |||
This will yield something like <code>/zhome/academic/HLRS/hlrs/hpcschlo/.julia/HLRS/hawk/conda/3/bin</code>. With this information, you can then start JupyterLab on port <code>18888</code> by running | |||
<syntaxhighlight lang="bash"> | |||
<julia_depot_path>/conde/3/bin/jupyter lab --no-browser --port=18888 | |||
</syntaxhighlight> | |||
Finally, you can open JupyterLab in your browser by using the link shown by JupyterLab. | |||
== For admins == | |||
=== Installing a new Julia version === | |||
The installation of Julia at HLRS, including information on how to add a new Julia version and the corresponding module files, is described in this repository: https://github.com/hlrs-tasc/julia-at-hlrs | |||
=== | === CUDA-aware MPI with Julia === | ||
==== MPT ==== | |||
In MPT 2.23 (the current default on Hawk), CUDA-aware MPI is broken for Julia. A fix for MPT is available, but a new version of MPT needs to be installed first. | |||
=== | ==== OpenMPI ==== | ||
At the moment, OpenMPI does not seem to support CUDA-aware MPI on the Hawk AI nodes. Instead, the execution crashes with a segmentation fault. To reproduce, log in to one of the AI nodes and execute | At the moment, OpenMPI does not seem to support CUDA-aware MPI on the Hawk AI nodes. Instead, the execution crashes with a segmentation fault. To reproduce, log in to one of the AI nodes and execute | ||
<syntaxhighlight lang="shell"> | <syntaxhighlight lang="shell"> |
Latest revision as of 09:41, 13 September 2024
Julia is a high-productivity, high-performance programming language. It is especially well-suited for numerical computations and scientific computing, with performance on par to traditional statically-compiled languages. At HLRS, the use of Julia is currently supported on Hawk via the julia
module.
Getting started
Create SSH SOCKS proxy to install packages
The compute systems at HLRS do not allow internet access from the login nodes. This prevents the out-of-the-box use of Julia’s package manager Pkg.jl, which handles the installation of packages and their dependencies. To enable the use of Pkg.jl again, you need to enable reverse dynamic forwarding when you log in via SSH. This allows Julia to access the internet via your own SSH session. The following instructions (based on this post) work on Unix-like systems with a fairly recent version of OpenSSH (>= 7.6, i.e., post-2017).
Log in to Hawk by executing
ssh -R SOCKS_PORT hawk.hww.hlrs.de
where SOCKS_PORT
should be a five-digit port number from the range of ephemeral port numbers. On Hawk, set the following environment variables to allow Julia to pick up the correct proxy settings when using Pkg.jl:
export https_proxy=socks5://localhost:SOCKS_PORT
export http_proxy=socks5://localhost:SOCKS_PORT
export JULIA_PKG_USE_CLI_GIT=true # optional
Now, all regular package operations should work as usual. Note: Setting JULIA_PKG_USE_CLI_GIT
is optional, as in many cases Pkg.jl will work without it - but it also does not hurt.
To make these changes permanent such that you do not have to execute them manually each time, add the lines above to your startup shell file, e.g., ~/.bash_profile
on Hawk. Furthermore, add the following entry to your local ~/.ssh/config
file:
Host hawk.hww.hlrs.de
RemoteForward SOCKS_PORT
Note that if someone already uses the hardcoded port (even you yourself in a different SSH session), you need to override it by providing the respective SSH arguments on the command line again.
A similar setup for setting up a dynamic reverse proxy also works on Windows when using PuTTY. Instructions (untested) can be found, e.g., here or here (ignore the part about configuring the proxy in Firefox).
Load the Julia module
To start using Julia on a login node or on one of the compute nodes, load the julia
module by executing
module load julia
Alternatively, if you are on a AI node and would like to use Julia with CUDA support, load the julia/cuda
module by running
module load julia/cuda
Note that CUDA-aware MPI is currently only supported when using the MPT MPI implementation (which is the current default on Hawk).
Note that the Julia module automatically sets your Julia depot path to $HOME/.julia/$SITE_NAME/$SITE_PLATFORM_NAME
, where SITE_NAME
is HLRS
and SITE_PLATFORM_NAME
is one of hawk
, vulcan
, or training
. This maintains a separate Julia depot for each system, which makes sense considering that they have different hardware.
Install MPI.jl and CUDA.jl
To install MPI.jl, execute
julia -e 'using Pkg; Pkg.add("MPI")'
on a login node. This will download and precompile MPI.jl and all of its dependencies. If you find that the installation produce is stuck at a very early stage (e.g., after outputting only Updating registry at `~/.julia/registries/General.toml`
), it means you have not properly set up your SOCKS proxy or forgot to add the appropriate environment variables.
You can check MPI.jl was properly configured by executing
julia -e 'using MPI; println(MPI.identify_implementation())'
This should give you an output similar to
(MPI.UnknownMPI, v"0.0.0")
if you use the default MPT MPI backend, and
(MPI.OpenMPI, v"4.0.5")
if you use OpenMPI.
If you also want to use the GPUs with Julia, install CUDA.jl by executing
julia -e 'using Pkg; Pkg.add("CUDA")'
on a login node. Note that you should not attempt to use or test CUDA.jl on the login nodes, since CUDA is not available here (they do not have GPUs) and thus anything CUDA-related will fail.
Verify that MPI works
Start an interactive session on a compute node by executing
qsub -I -l select=1:node_type=rome:ncpus=128:mpiprocs=128 -l walltime=00:20:00
Once your interactive job has been allocated, load the Julia module with module load julia
. Then, run a simple test program from the shell by executing
mpirun -np 5 julia mpi_test.jl
The code for mpi_test.jl
(based on code found here and here) is as follows:
# mpi_test.jl
using MPI
MPI.Init()
comm = MPI.COMM_WORLD
rank = MPI.Comm_rank(comm)
size = MPI.Comm_size(comm)
dst = mod(rank+1, size)
src = mod(rank-1, size)
println("rank=$rank, size=$size, dst=$dst, src=$src")
# allocate memory
N = 4
send_mesg = Array{Float64}(undef, N)
recv_mesg = Array{Float64}(undef, N)
fill!(send_mesg, Float64(rank))
# pass buffers into MPI functions
MPI.Sendrecv!(send_mesg, dst, 0, recv_mesg, src, 0, comm)
println("recv_mesg on proc $rank: $recv_mesg")
If everything is working OK, it should give you an output similar to
rank=0, size=5, dst=1, src=4
rank=1, size=5, dst=2, src=0
rank=2, size=5, dst=3, src=1
rank=3, size=5, dst=4, src=2
rank=4, size=5, dst=0, src=3
recv_mesg on proc 2: [1.0, 1.0, 1.0, 1.0]
recv_mesg on proc 1: [0.0, 0.0, 0.0, 0.0]
recv_mesg on proc 3: [2.0, 2.0, 2.0, 2.0]
recv_mesg on proc 0: [4.0, 4.0, 4.0, 4.0]
recv_mesg on proc 4: [3.0, 3.0, 3.0, 3.0]
Verify that CUDA works
To test CUDA, you need to leave your interactive session on a CPU node and get an interactive session on a GPU node by running
qsub -I -l select=1:node_type=nv-a100-40gb:mpiprocs=8 -l walltime=00:20:00
The first test is to check whether CUDA.jl can find all relevant drivers and GPUs. Load Julia with CUDA support by executing module load julia/cuda
. Then, start the Julia REPL by running julia
and execute
julia> using CUDA
julia> CUDA.versioninfo()
to see an output similar to this:
CUDA toolkit 11.4, local installation
NVIDIA driver 470.57.2, for CUDA 11.4
CUDA driver 11.4
Libraries:
- CUBLAS: 11.5.4
- CURAND: 10.2.5
- CUFFT: 10.5.1
- CUSOLVER: 11.2.0
- CUSPARSE: 11.6.0
- CUPTI: 14.0.0
- NVML: 11.0.0+470.57.2
- CUDNN: missing
- CUTENSOR: missing
Toolchain:
- Julia: 1.7.2
- LLVM: 12.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80
Environment:
- JULIA_CUDA_USE_MEMORY_POOL: none
- JULIA_CUDA_USE_BINARYBUILDER: false
8 devices:
0: NVIDIA A100-SXM4-40GB (sm_80, 39.583 GiB / 39.586 GiB available)
1: NVIDIA A100-SXM4-40GB (sm_80, 39.583 GiB / 39.586 GiB available)
2: NVIDIA A100-SXM4-40GB (sm_80, 39.583 GiB / 39.586 GiB available)
3: NVIDIA A100-SXM4-40GB (sm_80, 39.583 GiB / 39.586 GiB available)
4: NVIDIA A100-SXM4-40GB (sm_80, 39.583 GiB / 39.586 GiB available)
5: NVIDIA A100-SXM4-40GB (sm_80, 39.583 GiB / 39.586 GiB available)
6: NVIDIA A100-SXM4-40GB (sm_80, 39.583 GiB / 39.586 GiB available)
7: NVIDIA A100-SXM4-40GB (sm_80, 39.583 GiB / 39.586 GiB available)
As you can see, all 8 Nvidia Tesla A100 GPUs have been correctly detected.
Next, we will test if computing on the GPU is actually faster than on the CPU, to ensure that actual computations work. For this, paste the following snippet (based on code found here) in the Julia REPL. Approximate timings are included as reference for you:
using CUDA
A = rand(2000, 2000);
B = rand(2000, 2000);
@time A*B; # 1.296624 seconds (2.52 M allocations: 155.839 MiB, 23.66% gc time, 65.33% compilation time)
@time A*B; # 0.341631 seconds (2 allocations: 30.518 MiB)
Agpu = CuArray(A); # move matrix to gpu
Bgpu = CuArray(B); # move matrix to gpu
@time Agpu*Bgpu; # 1.544657 seconds (1.54 M allocations: 81.926 MiB, 2.16% gc time, 59.89% compilation time)
@time Agpu*Bgpu; # 0.000627 seconds (32 allocations: 640 bytes)
As you can see, the matrix-matrix multiplication on the GPU is much faster than on the CPU.
Note: At the moment, CUDA-aware MPI does not work with Julia on Hawk. Support for it is currently being worked on.
Where to go from here?
With the above steps completed, you are ready to run Julia-based programs in parallel on Hawk! With the initial setup completed, in future sessions you only need to load the appropriate Julia module with module load julia
or module load julia/cuda
and everything else should be set up for you.
Note: There have been reports that sometimes the MPT MPI implementation causes trouble when interacting with other MPI-aware libraries from within Julia. In this case, try the OpenMPI implementation by loading it with module load openmpi
. After switching the MPI backend, you also need to reconfigure MPI.jl by executing julia -e 'using Pkg; Pkg.build("MPI")'
.
Using IJulia/Jupyter
IJulia.jl is a Julia backend for the Jupyter interactive computing framework. It allows you to use Julia as a kernel in Jupyter notebooks, which combine code, documentation, and media such as images and videos in a single document.
Installation
Log in to Hawk and start the Julia REPL on a login node. Then, obtain IJulia by executing
using Pkg
Pkg.add("IJulia")
In addition to installing the IJulia package, this will put a kernel specification for Julia into $HOME/.local/share/jupyter/kernels
.
If you already have a working Jupyter installation on Hawk, you are finished with the installation process. If not, you can also have IJulia get its own Jupyter installation by executing
using IJulia
notebook()
The first time you run notebook()
it will ask you if it should install Jupyter for you. If you confirm, this will download and install everything required for running Jupyter inside your Julia depot path (see the JULIA_DEPOT_PATH
environment variable).
In addition to Jupyter, you also might want to get the newer JupyterLab web interface. You can install a Julia-managed version of JupyterLab similar to Jupyter itself by executing
using IJulia
jupyterlab()
and confirming the installation when prompted.
Getting started
To run Jupyter with IJulia on Hawk/Vulcan, you need to use some SSH magic to facilitate the exchange of data between your local browser and the Jupyter/JupyterLab instance running on an HLRS system (for simplicity, we will only refer to JupyterLab from now on, although everything stated also applies to plain Jupyter). Before beginning, you need to pick a port number you want JupyterLab to listen to for incoming requests. It must be in the range 1024-65535 and needs to be unused. In this documentation, we will use 18888
.
Login with local port forwarding
First, log in to Hawk via SSH and enable port forwarding for 18888
to the Hawk login node:
ssh -L 18888:localhost:18888 hawk.hww.hlrs.de
This will forward requests to your local port 18888
to the same port number on the Hawk login node.
The next step depends on what kind of computing you would like to perform: If you want to run JupyterLab on a compute node, proceed with the next section. Otherwise, you can skip that step and proceed with Start JupyterLab and open in browser.
Create interactive job with remote port forwarding
Start an interactive job on one compute node, e.g., by executing
qsub -I -l select=1:node_type=rome:ncpus=128:mpiprocs=128 -l walltime=01:00:00
and wait until you have been allocated the requested resources.
Next, you need to make sure that a connection from your local browser can be forwarded to a JupyterLab instance running on the compute node. This is achieved by starting SSH with remote port forwarding to create a tunnel from the login node to the compute node. For this you need to note the exact login node hostname to which you connected in the first step, e.g., hawk-login03
, and then use this to create the reverse tunnel:
ssh -R 18888:localhost:18888 hawk-login03 -N -f
The -N -f
options will make the SSH session go immediately in the background, yielding control back to your interactive job terminal.
Start JupyterLab and open in browser
Now it is time to start JupyterLab. First, we need to determine the path to the jupyter
executable. The binary directory can be found by executing
julia -e 'using Conda; println(Conda.SCRIPTDIR)'
This will yield something like /zhome/academic/HLRS/hlrs/hpcschlo/.julia/HLRS/hawk/conda/3/bin
. With this information, you can then start JupyterLab on port 18888
by running
<julia_depot_path>/conde/3/bin/jupyter lab --no-browser --port=18888
Finally, you can open JupyterLab in your browser by using the link shown by JupyterLab.
For admins
Installing a new Julia version
The installation of Julia at HLRS, including information on how to add a new Julia version and the corresponding module files, is described in this repository: https://github.com/hlrs-tasc/julia-at-hlrs
CUDA-aware MPI with Julia
MPT
In MPT 2.23 (the current default on Hawk), CUDA-aware MPI is broken for Julia. A fix for MPT is available, but a new version of MPT needs to be installed first.
OpenMPI
At the moment, OpenMPI does not seem to support CUDA-aware MPI on the Hawk AI nodes. Instead, the execution crashes with a segmentation fault. To reproduce, log in to one of the AI nodes and execute
mpirun -np 5 julia cuda_mpi_test.jl
where cuda_mpi_test.jl
is given as follows:
# cuda_mpi_test.jl
using MPI
using CUDA
MPI.Init()
comm = MPI.COMM_WORLD
rank = MPI.Comm_rank(comm)
size = MPI.Comm_size(comm)
dst = mod(rank+1, size)
src = mod(rank-1, size)
println("rank=$rank, size=$size, dst=$dst, src=$src")
# allocate memory on the GPU
N = 4
send_mesg = CuArray{Float64}(undef, N)
recv_mesg = CuArray{Float64}(undef, N)
fill!(send_mesg, Float64(rank))
# pass GPU buffers (CuArrays) into MPI functions
MPI.Sendrecv!(send_mesg, dst, 0, recv_mesg, src, 0, comm)
println("recv_mesg on proc $rank: $recv_mesg")
This will crash with an error similar to this:
rank=4, size=5, dst=0, src=3
rank=0, size=5, dst=1, src=4
rank=2, size=5, dst=3, src=1
rank=1, size=5, dst=2, src=0
rank=3, size=5, dst=4, src=2
[hawk-ai01:263609:0:263609] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0xa02000000)
[hawk-ai01:263605:0:263605] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0xa02000000)
[hawk-ai01:263607:0:263607] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0xa02000000)
[hawk-ai01:263606:0:263606] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0xa02000000)
[hawk-ai01:263608:0:263608] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0xa02000000)
==== backtrace (tid: 263606) ====
0 0x00000000000532f9 ucs_debug_print_backtrace() ???:0
1 0x0000000000012b20 .annobin_sigaction.c() sigaction.c:0
2 0x000000000015dd3b __memcpy_avx_unaligned() :0
3 0x0000000000043f4f ucp_wireup_select_sockaddr_transport() ???:0
4 0x00000000000148c9 uct_mm_ep_am_bcopy() ???:0
5 0x0000000000043fcb ucp_wireup_select_sockaddr_transport() ???:0
6 0x000000000003a74a ucp_tag_send_nbr() ???:0
7 0x00000000001c7e4f mca_pml_ucx_send() ???:0
8 0x00000000000bba69 PMPI_Sendrecv() ???:0
9 0x00000000000c4e0a _jl_invoke() /buildworker/worker/package_linux64/build/src/gf.c:2247
10 0x00000000000e3e96 jl_apply() /buildworker/worker/package_linux64/build/src/julia.h:1788
11 0x00000000000e390e eval_value() /buildworker/worker/package_linux64/build/src/interpreter.c:215
12 0x00000000000e46d2 eval_stmt_value() /buildworker/worker/package_linux64/build/src/interpreter.c:166
13 0x00000000000e46d2 eval_stmt_value() /buildworker/worker/package_linux64/build/src/interpreter.c:167
14 0x00000000000e46d2 eval_body() /buildworker/worker/package_linux64/build/src/interpreter.c:587
15 0x00000000000e52f8 jl_interpret_toplevel_thunk() /buildworker/worker/package_linux64/build/src/interpreter.c:731
16 0x00000000001027a4 jl_toplevel_eval_flex() /buildworker/worker/package_linux64/build/src/toplevel.c:885
17 0x00000000001029e5 jl_toplevel_eval_flex() /buildworker/worker/package_linux64/build/src/toplevel.c:830
18 0x000000000010462a jl_toplevel_eval_in() /buildworker/worker/package_linux64/build/src/toplevel.c:944
19 0x000000000115a83b eval() ./boot.jl:373
20 0x000000000115a83b japi1_include_string_40536() ./loading.jl:1196
21 0x00000000000c4e0a _jl_invoke() /buildworker/worker/package_linux64/build/src/gf.c:2247
22 0x000000000124a35b japi1__include_32082() ./loading.jl:1253
23 0x0000000000d67c16 japi1_include_36299() ./Base.jl:418
24 0x00000000000c4e0a _jl_invoke() /buildworker/worker/package_linux64/build/src/gf.c:2247
25 0x00000000012d064c julia_exec_options_33549() ./client.jl:292
26 0x0000000000d8a0f8 julia__start_38731() ./client.jl:495
27 0x0000000000d8a269 jfptr__start_38732.clone_1() text:0
28 0x00000000000c4e0a _jl_invoke() /buildworker/worker/package_linux64/build/src/gf.c:2247
29 0x00000000001282d6 jl_apply() /buildworker/worker/package_linux64/build/src/julia.h:1788
30 0x0000000000128c7d jl_repl_entrypoint() /buildworker/worker/package_linux64/build/src/jlapi.c:701
31 0x00000000004007d9 main() /buildworker/worker/package_linux64/build/cli/loader_exe.c:42
32 0x00000000000237b3 __libc_start_main() ???:0
33 0x0000000000400809 _start() ???:0
=================================
signal (11): Segmentation fault
in expression starting at /zhome/academic/HLRS/hlrs/hpcschlo/cuda_mpi_test.jl:21
__memmove_avx_unaligned at /lib64/libc.so.6 (unknown line)
unknown function (ip: 0x147c167e3f4e)
uct_mm_ep_am_bcopy at /lib64/libuct.so.0 (unknown line)
unknown function (ip: 0x147c167e3fca)
ucp_tag_send_nbr at /lib64/libucp.so.0 (unknown line)
mca_pml_ucx_send at /opt/hlrs/non-spack/mpi/openmpi/4.0.5-gcc-9.2.0/lib/libmpi.so (unknown line)
PMPI_Sendrecv at /opt/hlrs/non-spack/mpi/openmpi/4.0.5-gcc-9.2.0/lib/libmpi.so (unknown line)
Sendrecv! at /zhome/academic/HLRS/hlrs/hpcschlo/.julia/HLRS/hawk/packages/MPI/08SPr/src/pointtopoint.jl:380 [inlined]
Sendrecv! at /zhome/academic/HLRS/hlrs/hpcschlo/.julia/HLRS/hawk/packages/MPI/08SPr/src/pointtopoint.jl:389
unknown function (ip: 0x147c1a1062fb)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1788 [inlined]
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:126
[...]