- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

NEC Cluster QueuePolicies for (laki + laki2)

From HLRS Platforms
Revision as of 15:59, 22 October 2009 by Hpcbk (talk | contribs) (→‎hero)
Jump to navigationJump to search

Back to NEC Nehalem Cluster Batch System


Jobs are submitted by the user to the queue 'user'. The declaration together with the qsub command is optional.

 qsub -q user <...>

The further processing is done according to the requested amount of resources.

Job classes for the Nehalem-Cluster

Different job classes are available for efficient resource usage.

In the following the definition for eacch job class is given. In general jobs with a Duration up to 24 hours and half of the available resources can be submitted. For larger job different restrictions are in place respectively you have to consult the project team

All jobs without a specific duration (walltime) will be set to the default of 10 Minutes

test

This class is for tests with restricted resources needs. The jobs in this class are expected to deliver results after very short time and shall allow interactive work.

               Minimum      Maximum      Standard
 ------------------------------------------------
 walltime:     00:01:00     00:20:00     00:10:00
 nodes:        1            4            1
 ncpus:        1            32           1
 avail. nodes: 60
 ------------------------------------------------

For this job class 28 nodes are reserrved.

single

This class is only jobs which uses only one node.

               Minimum      Maximum      Standard
 ------------------------------------------------
 walltime:     00:01:00     24:00:00
 nodes:        1            1
 ncpus:        1            8
 avail. nodes: 700
 ------------------------------------------------

multi

This class is for parallel applications. (For jobs not fitting in the test queue)

               Minimum      Maximum      Standard
 ------------------------------------------------
 walltime:     00:20:00     4:00:00
 nodes:        2            4
 ncpus:        2            32
 start-
 priority:     2880
 avail. nodes: 700
 ------------------------------------------------
               Minimum      Maximum      Standard
 ------------------------------------------------
 walltime:     04:00:01     12:00:00
 nodes:        2            360
 ncpus:        2            2880
 start-
 priority:     1440
 avail. nodes: 621
 ------------------------------------------------
               Minimum      Maximum      Standard
 ------------------------------------------------
 walltime:     12:00:01     24:00:00
 nodes:        2            320
 ncpus:        2            2560
 start-
 priority:     1
 avail. nodes: 561
 ------------------------------------------------

hero

For parallel job with very high resources consumption.

               Minimum      Maximum      Standard
 ------------------------------------------------
 walltime:  00:01:00      06:00:00
 nodes:     360           650
 ncpus:     2880          5200
 ------------------------------------------------

To submit jobs in this class you need to specify the qname on the qsub comand:

 qsub -q hero ...

Jobs in this class will be accumulated by the system and started manually at a proper point in time. You have to anticipate that it will take up to several days to start these jobs. This queue isn't available all the time.

Please contact us.


tesla

For job using nodes with the Tesla GPU

               Minimum      Maximum      Standard
 ------------------------------------------------
 walltime:  00:01:00      06:00:00
 nodes:     1                 32
 ------------------------------------------------

To submit jobs in this class you need to specify the qname on the qsub command:

 qsub -q tesla ...

Maximum allowed jobs for each class

The number of jobs for each user in the different job classes is restricted to the following. If you reach this number you can submit further jobs when prior jobs have ended.

Job-class      Max. number       (Max. with waiting
               of jobs           queue 'user')
---------------------------------------------
default         5                  10
---------------------------------------------

If more jobs are submitted than allowed for one job class the odd ones will be placed in the Dispatcher queue 'user' and will move up in the proper queue after jobs from this user in the corresponding queue have ended. The waiting queue for each user will take up to 10 jobs. With this it is possible to submit job ahead.