- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

NEC Cluster QueuePolicies for (laki + laki2)

From HLRS Platforms
Jump to navigationJump to search

Back to Batch System


Jobs are submitted by the user to the default queue 'user'. The declaration together with the qsub command is optional.

 qsub -q user <...>

The further processing is done according to the requested amount of resources.

Job classes for the NEC-Cluster

Different job classes are available for efficient resource usage.

Users do not need to declare a job class with the qsub command. Jobs are sorted to the right class automatically. In the following the definition for eacch job class is given. In general jobs with a Duration up to 24 hours and half of the available resources can be submitted. For larger job different restrictions are in place respectively you have to consult the project team

All jobs without a specific duration (walltime) will be set to the default of 10 Minutes

test

This class is for tests with restricted resources needs. The jobs in this class are expected to deliver results after very short time and shall allow interactive work.

               Minimum      Maximum      Standard
 ------------------------------------------------
 walltime:     00:01:00     00:20:00     00:10:00
 nodes:        1            4            1
 ncpus:        1            32           1
 avail. nodes: 60
 ------------------------------------------------

For this job class 28 nodes are reserrved.

single

This class is only jobs which uses only one node.

               Minimum      Maximum      Standard
 ------------------------------------------------
 walltime:     00:01:00     24:00:00
 nodes:        1            1
 ncpus:        1            8
 avail. nodes: 700
 ------------------------------------------------

multi

This class is for parallel applications. (For jobs not fitting in the test queue)

               Minimum      Maximum      Standard
 ------------------------------------------------
 walltime:     00:20:00     4:00:00
 nodes:        2            4
 ncpus:        2            32
 start-
 priority:     2880
 avail. nodes: 700
 ------------------------------------------------
               Minimum      Maximum      Standard
 ------------------------------------------------
 walltime:     04:00:01     12:00:00
 nodes:        2            360
 ncpus:        2            2880
 start-
 priority:     1440
 avail. nodes: 621
 ------------------------------------------------
               Minimum      Maximum      Standard
 ------------------------------------------------
 walltime:     12:00:01     24:00:00
 nodes:        2            320
 ncpus:        2            2560
 start-
 priority:     1
 avail. nodes: 561
 ------------------------------------------------

Special Queue Classes

Get jobs into one of the following special Queue classes, users have to specify the queue name at the qsub command.

smp

For pre-and postprocessing jobs which needs very large memory (up to 1TB)

               Minimum      Maximum      Standard
 ------------------------------------------------
 walltime:     00:01:00      96:00:00
 nodes:              1           1
 ncpus:              1          48          1
 vmem:                          1000GB     24GB
 ------------------------------------------------

To submit jobs in this class you need to specify the qname on the qsub comand:

 qsub -q smp -l nodes=1:smp:ppn=1,vmem=24gb,walltime...

Jobs in this queue will get the smp node with max. 1TB memory. The node will be shared with other jobs at the same time. It is very very important to specify the number of cores (ppn=) and the amount of virtual memory (vmem=) your application needs. Otherwise you will get default values of 1 core and a max. vmem=24gb.

vis

For jobs (pre-postprocessing) which needs a node with graphic card for visualisation For job using nodes with the Tesla GPU

                     Minimum      Maximum 
 ------------------------------------------------
 walltime:         00:01:00      10:00:00
 nodes:             1                  1
 ncpus:             1                  8
 max queable:       1
 avail. nodes:                         5
 ------------------------------------------------

To submit jobs in this class you need to specify the qname on the qsub comand:

 qsub -X -q vis -l nodes=1:vis,walltime...

Its only 1 node per job possible. See also Graphic_Environment.

hero

For parallel job with very high resources consumption.

               Minimum      Maximum      Standard
 ------------------------------------------------
 walltime:  00:01:00      06:00:00
 nodes:     360           650
 ncpus:     2880          5200
 ------------------------------------------------

To submit jobs in this class you need to specify the qname on the qsub comand:

 qsub -q hero ...

Jobs in this class will be accumulated by the system and started manually at a proper point in time. You have to anticipate that it will take up to several days to start these jobs. This queue isn't available all the time and not for all users.

Please contact us.


tesla

For job using nodes with the Tesla GPU

               Minimum      Maximum      Standard
 ------------------------------------------------
 walltime:  00:01:00      10:00:00
 nodes:     1             20
 ------------------------------------------------

To submit jobs in this class you need to specify the qname on the qsub command:

 qsub -q tesla -l nodes=1:tesla, ...

Maximum allowed jobs for each class

The number of jobs for each user in the different job classes is restricted to the following. If you reach this number you can submit further jobs when prior jobs have ended.

Job-class      Max. number       (Max. with waiting
               of jobs           queue 'user')
---------------------------------------------
default         5                  10
---------------------------------------------

If more jobs are submitted than allowed for one job class the old ones will be placed in the Dispatcher queue 'user' and will move up in the proper queue after jobs from this user in the corresponding queue have ended. The waiting queue for each user will take up to 10 jobs. With this it is possible to submit job ahead.



Job Run Limitations

  • The maximum time limit for a Job is 24hours.
  • User limits:
    • limited number of jobs of one user that can run at the same time
    • in total a user can only allocate 3072 cores or 384 nodes (it depends on the requested wall time).
  • User Group limits:
    • limited number of jobs of users in the same group that can run at the same time
  • Batch Queue limits of all user jobs:
    • not all nodes / node types are available on each queue (visualisation nodes can not be used in multi node job queues)


Queues with extended wall time limits

are not available in general. This Queue spec1 is available for Jobs, which can not run within the 24h timeframe. Access to this queue is only granted by passing an evaluation process. Following rules apply to this queue:

  • Jobs may be killed for operation reasons at any time.
  • Jobs will be accounted in any case. This is also true if the jab has to be terminated for operational reasons.
  • Joblimit per Group = 1
  • Joblimit per user = 1
  • Total number of nodes/cores used for this queue = 64 / 1024
  • Node types available: nehalem, mem12gb, mem24gb, sb, mem32gb
  • Low scheduling priority
  • Max walltime 96h