- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -

NEC Cluster QueuePolicies for (laki + laki2)

From HLRS Platforms
Revision as of 16:58, 20 November 2012 by Hpcbk (talk | contribs)
Jump to navigationJump to search

Back to NEC Nehalem Cluster Batch System

Jobs are submitted by the user to the queue 'user'. The declaration together with the qsub command is optional.

 qsub -q user <...>

The further processing is done according to the requested amount of resources.

Job classes for the Nehalem-Cluster

Different job classes are available for efficient resource usage.

In the following the definition for eacch job class is given. In general jobs with a Duration up to 24 hours and half of the available resources can be submitted. For larger job different restrictions are in place respectively you have to consult the project team

All jobs without a specific duration (walltime) will be set to the default of 10 Minutes


This class is for tests with restricted resources needs. The jobs in this class are expected to deliver results after very short time and shall allow interactive work.

               Minimum      Maximum      Standard
 walltime:     00:01:00     00:20:00     00:10:00
 nodes:        1            4            1
 ncpus:        1            32           1
 avail. nodes: 60

For this job class 28 nodes are reserrved.


This class is only jobs which uses only one node.

               Minimum      Maximum      Standard
 walltime:     00:01:00     24:00:00
 nodes:        1            1
 ncpus:        1            8
 avail. nodes: 700


This class is for parallel applications. (For jobs not fitting in the test queue)

               Minimum      Maximum      Standard
 walltime:     00:20:00     4:00:00
 nodes:        2            4
 ncpus:        2            32
 priority:     2880
 avail. nodes: 700
               Minimum      Maximum      Standard
 walltime:     04:00:01     12:00:00
 nodes:        2            360
 ncpus:        2            2880
 priority:     1440
 avail. nodes: 621
               Minimum      Maximum      Standard
 walltime:     12:00:01     24:00:00
 nodes:        2            320
 ncpus:        2            2560
 priority:     1
 avail. nodes: 561


For parallel job with very high resources consumption.

               Minimum      Maximum      Standard
 walltime:  00:01:00      06:00:00
 nodes:     360           650
 ncpus:     2880          5200

To submit jobs in this class you need to specify the qname on the qsub comand:

 qsub -q hero ...

Jobs in this class will be accumulated by the system and started manually at a proper point in time. You have to anticipate that it will take up to several days to start these jobs. This queue isn't available all the time.

Please contact us.


For job using nodes with the Tesla GPU

               Minimum      Maximum      Standard
 walltime:  00:01:00      10:00:00
 nodes:     1             32

To submit jobs in this class you need to specify the qname on the qsub command:

 qsub -q tesla ...

Maximum allowed jobs for each class

The number of jobs for each user in the different job classes is restricted to the following. If you reach this number you can submit further jobs when prior jobs have ended.

Job-class      Max. number       (Max. with waiting
               of jobs           queue 'user')
default         5                  10

If more jobs are submitted than allowed for one job class the old ones will be placed in the Dispatcher queue 'user' and will move up in the proper queue after jobs from this user in the corresponding queue have ended. The waiting queue for each user will take up to 10 jobs. With this it is possible to submit job ahead.

Job Run Limitations

  • The maximum time limit for a Job is 24hours.
  • User limits:
    • limited number of jobs of one user that can run at the same time
    • in total a user can only allocate 3072 cores or 384 nodes (it depends on the requested wall time).
  • User Group limits:
    • limited number of jobs of users in the same group that can run at the same time
  • Batch Queue limits of all user jobs:
    • not all nodes / node types are available on each queue (visualisation nodes can not be used in multi node job queues)

Queues with extended wall time limits

are not available in general. This Queue is available for Jobs, which can not run within the 24h timeframe. Access to this queue is only granted by passing an evaluation process. Following rules apply to this queue:

  • Jobs may be killed for operation reasons at any time.
  • Jobs will be accounted in any case. This is also true if the jab has to be terminated for operational reasons.
  • Joblimit per Group = 1
  • Joblimit per user = 1
  • Total number of nodes/cores used for this queue = 64 / 1024
  • Node types available: nehalem, mem12gb, mem24gb, mem48gb, sb
  • Low scheduling priority
  • Max walltime 96h