- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -
CAE howtos
Licensing
ssh-Tunnel
To use a remote license server, a ssh-Tunnel can be used. If a ssh-Tunnel connects a local compute node TCP port with the port the license server listens to, the license can be checked out through the local port.
Setup
application node (compute node)
the node where the license is drawn
ssh server
a proxy between the application node and the license server
- The ssh server has to be accessible from the application node (maybe through a NAT-gateway) and the license server has to be accessible by the ssh server. Thus there mustn't be a firewall to prevent the connections. However the ssh server firewall only has to enable a connection to the application node and the license server port (and probably an administration computer or internal network).
- The sshd configuration has to enable "AllowTcpForwarding yes" (instead of port 22 also an alternative port might be used).
- The ssh server user does not need a login-shell to just establish a ssh tunnel (/bin/false is enough), but
- a passwordless access is needed to automize the setup of a ssh tunnel from a job script.
license server
the node a license is served
Job script example excerpt
# specify license server and port (using a TCP connection) export LICSERVER=licserver.mydomain.de # license server export LICSERVER_PORT=12345 # license server port (use vendor daemon port for flexnet) echo -e "license server:\t ${LICSERVER}:${LICSERVER_PORT}" export LICSERVERlocal=localhost # local license server #export LICSERVERlocal=`hostname` # needs ssh \* binding address export LICSERVERlocal_PORT=${LICSERVER_PORT:-12345} # local license port echo -e "local license ssh tunnel end:\t${LICSERVERlocal}:${LICSERVERlocal_PORT}" SSH_userserver="user@sshserver.mydomain.de" # passwordless ssh access needed! SSH_PORT=22 SSH_ctrlsocket="sshtunnelCtrlSocket.${PBS_JOBID}" echo "[`date +%Y-%m-%dT%H:%M:%S`] setting up ssh tunnel through ${SSH_userserver} (control socket: ${SSH_ctrlsocket})" #rm -f "${SSH_ctrlsocket}" # removing socket file should not be necessary # establish ssh tunnel (might add additional options like e.g. -o ServerAliveInterval=60 or -o TCPKeepAlive=yes) ssh -MS "${SSH_ctrlsocket}" -fNTL ${LICSERVERlocal_PORT}:${LICSERVER}:${LICSERVER_PORT} -p ${SSH_PORT} ${SSH_userserver} # check ssh tunnel ssh -S "${SSH_ctrlsocket}" -O check ${SSH_userserver} || (echo "ssh CTRL socket ${SSH_ctrlsocket} check failed - wait some more time..."; sleep 10) ## adjusting license server environment variables to the ssh tunnel end # e.g. flexnet (using vendor daemon port) export LM_LICENSE_FILE="${LICSERVERlocal_PORT}@${LICSERVERlocal}" echo "[`date +%Y-%m-%dT%H:%M:%S`] licensing redirected to ${LM_LICENSE_FILE}" # alternative check of connection (output redirected to stderr) nc -zvw4 ${LICSERVERlocal} ${LICSERVERlocal_PORT} 1>&2 # alternatives, e.g.: ##nmap --system-dns -PN -p${LICSERVERlocal_PORT} ${LICSERVERlocal} if [ $? -ne 0 ]; then echo "ERROR reaching ${LICSERVERlocal}:${LICSERVERlocal_PORT}" else echo "test connection to ${LICSERVERlocal}:${LICSERVERlocal_PORT} succeeded" fi # # start simulation... # # close connection ssh -S "${SSH_ctrlsocket}" -O exit ${SSH_userserver}
Improvements
- replacing ssh with autossh to automatically restart the ssh-connection if necessary and improve resiliency
Connectivity checks
There might be firewalls, which block a direct connection. To check, if a connection can be established, some checks might be performed (e.g. from a frontend or within an interactive session)
- check if port can be reached
nc -zvw4 <SERVER> <PORT>
nmap --system-dns -PN -p <PORT> ${SERVER}
(nmap is not available on HLRS/HWW systems at the moment.)
- check how far we get (assuming TCP connection)
traceroute -T -p <PORT> <SERVER>
(traceroute is disabled on HLRS/HWW systems at the moment.)
For UDP also tracepath might be used.
- check external IP address
The IP address "seen" from outside might be different than the internal one. Check e.g.
https://websrv.hlrs.de/ipinfo
Jobscipts
Self-initiate termination & more
PBSpro (and other batch systems) send a SIGTERM to the executed jobscript at the end of the job walltime. However the time before the job termination might be too short and thus taking care of this within the jobscript itself is a more flexible alternative. First the time to wait will be calculated (assuming running a bash jobscript in the example here):
timebeforeend=$(( 5*60 )) # 5 min module load cae jobremainingwalltime=$(qwtime -r) remaintingwalltime2stop=$(( jobremainingwalltime-timebeforeend ))
Send SIGTERM to command after some time
If the command is known e.g. killall can send a SIGTERM:
cmd="path/mycommand" (sleep ${remaintingwalltime2stop}; killall "${cmd}" ) & # start a subshell in the background which will sleep first $cmd $options # also with e.g. mpirun
LS-Dyna
LS-Dyna checks the existence and content of a file d3kil, which makes it possible to trigger a program termination:
# LS-Dyna sense switches ##Type Response # SW1. A restart file is written and LS-DYNA terminates. # SW2. LS-DYNA responds with time and cycle numbers. # SW3. A restart file is written and LS-DYNA continues. # SW4. A plot state is written and LS-DYNA continues. # SW5. Enter interactive graphics phase and real time visualization. # SW7. Turn off real time visualization. # SW8. Interactive 2D rezoner for solid elements and real time visualization. # SW9. Turn off real time visualization (for option SW8). # SWA. Flush ASCII file buffers. # lprint Enable/Disable printing of equation solver memory, cpu requirements. # nlprint Enable/Disable printing of nonlinear equilibrium iteration information. # iter Enable/Disable output of binary plot database "d3iter" showing mesh after each equilibrium iteration. Useful for debugging convergence problems. # conv Temporarily override nonlinear convergence tolerances. # stop Halt execution immediately, closing open files. ## dumpsenseswitch='SW1' # see above for the definition of remaintingwalltime2stop (sleep ${remaintingwalltime2stop}; echo ${dumpsenseswitch} >d3kil ) & # LS-Dyna will be executed afterwards
Check free memory
The same technique can also be used to check the free memory after a initial waiting time of 10sec periodically every minute, saving the results in an file within the actual directory e.g. with
(sleep 10; freeavail.sh --periodic 60:`qwtime -r` -n `qjobnodes.sh -n` > "$PWD/freeavail_${PBS_JOBNAME%.*}.${PBS_JOBID%%.*}") &
before starting the program.
ISV codes
- also see ISV_Usage