- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

CRAY XE6 notes for the upgraded Batch System: Difference between revisions

From HLRS Platforms
Jump to navigationJump to search
No edit summary
Line 76: Line 76:
By defining p64 and p32 in the example above you can control the number of processes on each node for the different node types (64GB memory and 32GB memory). This corresponds to the qsub job option ''"-l mppnppn=32"'' for mono node type mpp jobs (see examples in previous chapters above). Important to know is the maximum value is 32, the number of cores of each mpp node.
By defining p64 and p32 in the example above you can control the number of processes on each node for the different node types (64GB memory and 32GB memory). This corresponds to the qsub job option ''"-l mppnppn=32"'' for mono node type mpp jobs (see examples in previous chapters above). Important to know is the maximum value is 32, the number of cores of each mpp node.


=== Deprecated CRAY submission syntax mppwidth, mppnppn, mppdepth, feature ===
=== Deprecated CRAY qsub syntax using mppwidth, mppnppn, mppdepth, feature ===
The qsub arguments specially available for CRAY XE6 system (mppwidth, mppnppn, mppdepth, feature) is deprecated in the new batch system version.
The qsub arguments specially available for CRAY XE6 system (mppwidth, mppnppn, mppdepth, feature) is deprecated in the new batch system version.
Most functionalities of those qsub arguments are still available in this new batch system version.
Most functionalities of those qsub arguments are still available in this new batch system version.
Line 89: Line 89:
* ppn: replacement for mppnppn
* ppn: replacement for mppnppn
* mem32gb: replacement for feature=mem32gb
* mem32gb: replacement for feature=mem32gb


=== Run job on other Account ID ===
=== Run job on other Account ID ===

Revision as of 20:10, 24 April 2014

The batch system on CRAY XE6 (hermit) will be upgraded at Tue, 6th Mai 2014. Most functionalities can be used identically to the old version. But there are some things which will change and users needs to modify their batch job submission scripts on hermit after the batch system upgrade has been done. So, please take a look to following things:

Jobs with mixed node features

see also old version batch system for comparison (the old example will not work after the upgrade!).

Here is a new example batch job for requests with mixed node features for the new version of the batch system on hermit:


You need to specify resource name nodes=<node count>:ppn=<process count per node>:mem64gb+<node count>:ppn=<proc count per node>:mem32gb:

 qsub -l nodes=1:ppn=32:mem64gb+64:ppn=32:mem32gb,walltime=3600 my_batchjob_script.pbs

The example above will allocate 65 nodes to your job for a maximum time of 3600 seconds and can place 32 processes on one node with 64GB memory and 32 processes on each of the 64 allocated nodes with 32GB memory. Important is option ppn=32 to get all cores of the allocated mpp nodes.

Now you need to select your different allocated nodes for your aprun command in your script my_batchjob_script.pbs. A new example for the new batch system is here:

#!/bin/bash
#PBS -N mixed_job
#PBS -l nodes=1:ppn=16:mem64gb+2:ppn=32:mem32gb
#PBS -l walltime=300

### defining the number of PEs (processes per node ( max 32 for hermit | max 16 for hornet) ###
# p32: number of PEs (Processing Elements) on 32GB nodes
# p64: number of PEs (Processing Elements) on 64GB nodes
#-------------------------------------------------------
p32=32
p64=16

# Change to the direcotry that the job was submitted from
cd $PBS_O_WORKDIR


### selecting nodes with different memory ###
#---------------------------------------
# 1. getting all nodes of my job
nids=$(/opt/hlrs/system/tools/getjhostlist)

# 2. getting the nodes with feature mem32gb of my job
nid32=$(/opt/hlrs/system/tools/hostlistf mem32gb "$nids")
# how many nodes do I have with mem32gb:
i32=$(/opt/hlrs/system/tools/cntcommastr "$nid32")

# 3. getting the nodes with feature mem64gb of my job
nid64=$(/opt/hlrs/system/tools/hostlistf mem64gb "$nids")
# how many nodes do I have with mem64gb:
i64=$(/opt/hlrs/system/tools/cntcommastr "$nid64")


(( P32 = $i32 * $p32 ))
(( P64 = $i64 * $p64 ))
(( D32 = 32 / p32 ))
(( D64 = 32 / p64 ))

# Launch the parallel job to the allocated compute nodes using
# Multi Program, Multi Data (MPMD) mode (see "man aprun")
# -------------------------------------------------
# $nid64 : node list with 64GB memory
# $i64     : number of nodes with 64GB memory
# $p64    : number of PEs per node on nodes with 64GB
# $P64    : total number of PEs (processing elements) on nodes with 64GB
# ----
# $nid32 : node list with 32GB memory
# $i32     : number of nodes with 32GB memory
# $p32    : number of PEs per node on nodes with 32GB
# $P32    : total number of PEs on nodes with 32GB
# ----------
# The "env OMP_NUM_THREADS=...." parts of the aprun command below are only useful for OpenMP (hybrid) programs.
#
aprun -L $nid64 -n $P64 -N $p64 -d $D64 env OMP_NUM_THREADS=$D64 ./my_executable1 : -L $nid32 -n $P32 -N $p32 -d $D32 env OMP_NUM_THREADS=$D32 ./my_executable2

By defining p64 and p32 in the example above you can control the number of processes on each node for the different node types (64GB memory and 32GB memory). This corresponds to the qsub job option "-l mppnppn=32" for mono node type mpp jobs (see examples in previous chapters above). Important to know is the maximum value is 32, the number of cores of each mpp node.

Deprecated CRAY qsub syntax using mppwidth, mppnppn, mppdepth, feature

The qsub arguments specially available for CRAY XE6 system (mppwidth, mppnppn, mppdepth, feature) is deprecated in the new batch system version. Most functionalities of those qsub arguments are still available in this new batch system version. Nevertheless, we recommend not to use this qsub arguments. Please use following syntax:


 qsub -l nodes=2:ppn=32:mem32gb <myjobscript>

(replaces: qsub -l mppwidth=64,mppnppn=32,feature=mem32gb)

  • nodes: replacement for mppwidth/mppnppn
  • ppn: replacement for mppnppn
  • mem32gb: replacement for feature=mem32gb

Run job on other Account ID

There are Unix groups associated to the project account ID (ACID). To run a job on a non-default project budget, the groupname of this project has to be passed in the group_list:

 qsub -W group_list=<groupname> ...

To get your available groups:

 id 
Warning: Its not possible anymore to use your primary group for groupname!