- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -
NEC Cluster QueuePolicies for (laki + laki2)
Back to Batch System
Jobs are submitted by the user to the default queue 'user'. The declaration together with the qsub command is optional.
qsub -q user <...>
The further processing is done according to the requested amount of resources.
Job classes for the NEC-Cluster
Different job classes are available for efficient resource usage.
Users do not need to declare a job class with the qsub command. Jobs are sorted to the right class automatically. In the following the definition for eacch job class is given. In general jobs with a Duration up to 24 hours and half of the available resources can be submitted. For larger job different restrictions are in place respectively you have to consult the project team
All jobs without a specific duration (walltime) will be set to the default of 10 Minutes
test
This class is for tests with restricted resources needs. The jobs in this class are expected to deliver results after very short time and shall allow interactive work.
Minimum Maximum Standard ------------------------------------------------ walltime: 00:01:00 00:20:00 00:10:00 nodes: 1 4 1 ncpus: 1 32 1 avail. nodes: 60 ------------------------------------------------
For this job class 28 nodes are reserrved.
single
This class is only jobs which uses only one node.
Minimum Maximum Standard ------------------------------------------------ walltime: 00:01:00 24:00:00 nodes: 1 1 ncpus: 1 8 avail. nodes: 700 ------------------------------------------------
multi
This class is for parallel applications. (For jobs not fitting in the test queue)
Minimum Maximum Standard ------------------------------------------------ walltime: 00:20:00 4:00:00 nodes: 2 4 ncpus: 2 32 start- priority: 2880 avail. nodes: 700 ------------------------------------------------
Minimum Maximum Standard ------------------------------------------------ walltime: 04:00:01 12:00:00 nodes: 2 360 ncpus: 2 2880 start- priority: 1440 avail. nodes: 621 ------------------------------------------------
Minimum Maximum Standard ------------------------------------------------ walltime: 12:00:01 24:00:00 nodes: 2 320 ncpus: 2 2560 start- priority: 1 avail. nodes: 561 ------------------------------------------------
Special Queue Classes
Get jobs into one of the following special Queue classes, users have to specify the queue name at the qsub command.
smp
For pre-and postprocessing jobs which needs very large memory (up to 1TB)
Minimum Maximum Standard ------------------------------------------------ walltime: 00:01:00 96:00:00 nodes: 1 1 ncpus: 1 48 1 vmem: 1000GB 24GB ------------------------------------------------
To submit jobs in this class you need to specify the qname on the qsub comand:
qsub -q smp -l nodes=1:smp:ppn=1,vmem=24gb,walltime...
Jobs in this queue will get the smp node with max. 1TB memory. The node will be shared with other jobs at the same time. It is very very important to specify the number of cores (ppn=) and the amount of virtual memory (vmem=) your application needs. Otherwise you will get default values of 1 core and a max. vmem=24gb.
vis
For jobs (pre-postprocessing) which needs a node with graphic card for visualisation For job using nodes with the Tesla GPU
Minimum Maximum ------------------------------------------------ walltime: 00:01:00 10:00:00 nodes: 1 1 ncpus: 1 8 max queable: 1 avail. nodes: 5 ------------------------------------------------
To submit jobs in this class you need to specify the qname on the qsub comand:
qsub -X -q vis -l nodes=1:vis,walltime...
Its only 1 node per job possible. See also Graphic_Environment.
hero
For parallel job with very high resources consumption.
Minimum Maximum Standard ------------------------------------------------ walltime: 00:01:00 06:00:00 nodes: 360 650 ncpus: 2880 5200 ------------------------------------------------
To submit jobs in this class you need to specify the qname on the qsub comand:
qsub -q hero ...
Jobs in this class will be accumulated by the system and started manually at a proper point in time. You have to anticipate that it will take up to several days to start these jobs. This queue isn't available all the time and not for all users.
Please contact us.
tesla
For job using nodes with the Tesla GPU
Minimum Maximum Standard ------------------------------------------------ walltime: 00:01:00 10:00:00 nodes: 1 20 ------------------------------------------------
To submit jobs in this class you need to specify the qname on the qsub command:
qsub -q tesla -l nodes=1:tesla, ...
Maximum allowed jobs for each class
The number of jobs for each user in the different job classes is restricted to the following. If you reach this number you can submit further jobs when prior jobs have ended.
Job-class Max. number (Max. with waiting of jobs queue 'user') --------------------------------------------- default 5 10 ---------------------------------------------
If more jobs are submitted than allowed for one job class the old ones will be placed in the Dispatcher queue 'user' and will move up in the proper queue after jobs from this user in the corresponding queue have ended. The waiting queue for each user will take up to 10 jobs. With this it is possible to submit job ahead.
Job Run Limitations
- The maximum time limit for a Job is 24hours.
- User limits:
- limited number of jobs of one user that can run at the same time
- in total a user can only allocate 3072 cores or 384 nodes (it depends on the requested wall time).
- User Group limits:
- limited number of jobs of users in the same group that can run at the same time
- Batch Queue limits of all user jobs:
- not all nodes / node types are available on each queue (visualisation nodes can not be used in multi node job queues)
Queues with extended wall time limits
are not available in general. This Queue spec1 is available for Jobs, which can not run within the 24h timeframe. Access to this queue is only granted by passing an evaluation process. Following rules apply to this queue:
- Jobs may be killed for operation reasons at any time.
- Jobs will be accounted in any case. This is also true if the jab has to be terminated for operational reasons.
- Joblimit per Group = 1
- Joblimit per user = 1
- Total number of nodes/cores used for this queue = 64 / 1024
- Node types available: nehalem, mem12gb, mem24gb, sb, mem32gb
- Low scheduling priority
- Max walltime 96h