- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

NEC Cluster QueuePolicies for (laki + laki2): Difference between revisions

From HLRS Platforms
Jump to navigationJump to search
No edit summary
No edit summary
Line 117: Line 117:
queue have ended. The waiting queue for each user will take up to 10 jobs.  
queue have ended. The waiting queue for each user will take up to 10 jobs.  
With this it is possible to submit job ahead.
With this it is possible to submit job ahead.
=== Job Run Limitations ===
*'''The maximum time limit for a Job is 24hours.'''
* User limits:
** limited number of jobs of one user that can run at the same time
** in total a user can only allocate 3072 cores or 384 nodes (it depends on the requested wall time).
* User Group limits:
** limited number of jobs of users in the same group that can run at the same time
* Batch Queue limits of all user jobs:
** not all nodes / node types are available on each queue (visualisation nodes can not be used in multi node job queues)
== Queues with extended wall time limits ==
are not available in general. This Queue is available for Jobs, which can not run within the 24h timeframe. Access to this queue is only granted by passing an evaluation process. Following rules apply to this queue:
* Jobs may be killed for operation reasons at any time.
* Jobs will be accounted in any case. This is also true if the jab has to be terminated for operational reasons.
* Joblimit per Group = 1
* Joblimit per user = 1
* Total number of nodes/cores used for this queue = 64 / 1024
* Node types available:    nehalem, mem12gb, mem24gb, mem48gb, sb
* Low scheduling priority
* Max walltime  96h

Revision as of 16:58, 20 November 2012

Back to NEC Nehalem Cluster Batch System


Jobs are submitted by the user to the queue 'user'. The declaration together with the qsub command is optional.

 qsub -q user <...>

The further processing is done according to the requested amount of resources.

Job classes for the Nehalem-Cluster

Different job classes are available for efficient resource usage.

In the following the definition for eacch job class is given. In general jobs with a Duration up to 24 hours and half of the available resources can be submitted. For larger job different restrictions are in place respectively you have to consult the project team

All jobs without a specific duration (walltime) will be set to the default of 10 Minutes

test

This class is for tests with restricted resources needs. The jobs in this class are expected to deliver results after very short time and shall allow interactive work.

               Minimum      Maximum      Standard
 ------------------------------------------------
 walltime:     00:01:00     00:20:00     00:10:00
 nodes:        1            4            1
 ncpus:        1            32           1
 avail. nodes: 60
 ------------------------------------------------

For this job class 28 nodes are reserrved.

single

This class is only jobs which uses only one node.

               Minimum      Maximum      Standard
 ------------------------------------------------
 walltime:     00:01:00     24:00:00
 nodes:        1            1
 ncpus:        1            8
 avail. nodes: 700
 ------------------------------------------------

multi

This class is for parallel applications. (For jobs not fitting in the test queue)

               Minimum      Maximum      Standard
 ------------------------------------------------
 walltime:     00:20:00     4:00:00
 nodes:        2            4
 ncpus:        2            32
 start-
 priority:     2880
 avail. nodes: 700
 ------------------------------------------------
               Minimum      Maximum      Standard
 ------------------------------------------------
 walltime:     04:00:01     12:00:00
 nodes:        2            360
 ncpus:        2            2880
 start-
 priority:     1440
 avail. nodes: 621
 ------------------------------------------------
               Minimum      Maximum      Standard
 ------------------------------------------------
 walltime:     12:00:01     24:00:00
 nodes:        2            320
 ncpus:        2            2560
 start-
 priority:     1
 avail. nodes: 561
 ------------------------------------------------

hero

For parallel job with very high resources consumption.

               Minimum      Maximum      Standard
 ------------------------------------------------
 walltime:  00:01:00      06:00:00
 nodes:     360           650
 ncpus:     2880          5200
 ------------------------------------------------

To submit jobs in this class you need to specify the qname on the qsub comand:

 qsub -q hero ...

Jobs in this class will be accumulated by the system and started manually at a proper point in time. You have to anticipate that it will take up to several days to start these jobs. This queue isn't available all the time.

Please contact us.


tesla

For job using nodes with the Tesla GPU

               Minimum      Maximum      Standard
 ------------------------------------------------
 walltime:  00:01:00      10:00:00
 nodes:     1             32
 ------------------------------------------------

To submit jobs in this class you need to specify the qname on the qsub command:

 qsub -q tesla ...

Maximum allowed jobs for each class

The number of jobs for each user in the different job classes is restricted to the following. If you reach this number you can submit further jobs when prior jobs have ended.

Job-class      Max. number       (Max. with waiting
               of jobs           queue 'user')
---------------------------------------------
default         5                  10
---------------------------------------------

If more jobs are submitted than allowed for one job class the old ones will be placed in the Dispatcher queue 'user' and will move up in the proper queue after jobs from this user in the corresponding queue have ended. The waiting queue for each user will take up to 10 jobs. With this it is possible to submit job ahead.



Job Run Limitations

  • The maximum time limit for a Job is 24hours.
  • User limits:
    • limited number of jobs of one user that can run at the same time
    • in total a user can only allocate 3072 cores or 384 nodes (it depends on the requested wall time).
  • User Group limits:
    • limited number of jobs of users in the same group that can run at the same time
  • Batch Queue limits of all user jobs:
    • not all nodes / node types are available on each queue (visualisation nodes can not be used in multi node job queues)


Queues with extended wall time limits

are not available in general. This Queue is available for Jobs, which can not run within the 24h timeframe. Access to this queue is only granted by passing an evaluation process. Following rules apply to this queue:

  • Jobs may be killed for operation reasons at any time.
  • Jobs will be accounted in any case. This is also true if the jab has to be terminated for operational reasons.
  • Joblimit per Group = 1
  • Joblimit per user = 1
  • Total number of nodes/cores used for this queue = 64 / 1024
  • Node types available: nehalem, mem12gb, mem24gb, mem48gb, sb
  • Low scheduling priority
  • Max walltime 96h