- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -

Difference between revisions of "CRAY XE6 Batch System Layout and Limits"

From HLRS Platforms
Jump to navigationJump to search
Line 1: Line 1:
There are different type of queues configured on this System. These will be used to set proper priorities for different jobs, consider user permissions and resource reservations for different user groups. The configuration is laid out so that in default all you need to do is request the number of processes (mppwidth) you need (along with the number of processes per node) and the time (walltime) for your job. The scheduling system will then place the job in the appropriate queue.  
+
There are different type of queues configured on this System. These will be used to set proper priorities for different jobs, consider user permissions and resource reservations for different user groups. The configuration is laid out so that in default all you need to do is request the number of processes (mppwidth) you need (along with the number of processes per node) and the time (walltime) for your job. The scheduling system will then place the job in the appropriate queue. Users should always specify a realistic value for the walltime. Jobs with a shorter walltime get a higher priority and may be used for backfilling (users usually specify 24h, this is the
 +
max. time limit on HLRS systems. If your job usually runs in 4h 17min and you specified 5h, your job will be selected if nodes are
 +
available for this timeframe while the job-scheduler is collecting more nodes for a larger job).
  
  

Revision as of 14:13, 20 November 2012

There are different type of queues configured on this System. These will be used to set proper priorities for different jobs, consider user permissions and resource reservations for different user groups. The configuration is laid out so that in default all you need to do is request the number of processes (mppwidth) you need (along with the number of processes per node) and the time (walltime) for your job. The scheduling system will then place the job in the appropriate queue. Users should always specify a realistic value for the walltime. Jobs with a shorter walltime get a higher priority and may be used for backfilling (users usually specify 24h, this is the max. time limit on HLRS systems. If your job usually runs in 4h 17min and you specified 5h, your job will be selected if nodes are available for this timeframe while the job-scheduler is collecting more nodes for a larger job).


At the moment there are only two different type of queues defined:

  • mpp, the default job queue for Massive Parallel Processing jobs
  • ccm, the job queue for the Cluster Compatibility Mode (CCM)

If you don't specify any queue on your qsub command, then your job will be routed in the mpp queues.

Using the qsub option "-q ccm", then your job will be prepared for the Cluster Compatibility Mode (CCM). The CCM is a software solution that provides the services needed to run most cluster-based independent software vendor (ISV) applications out-of-the-box with some configuration adjustments.

Note:

  • CCM will only be available for some users, not for default!
  • On workdays (Mo.- Fr.) between 6:00 - 22:00, 15% of the compute nodes are reservered for short jobs with walltime lower than 4h.
  • The max. walltime for jobs is 24h.

Job Run Limitations

  • User limits:
    • limited number of jobs of one user that can run at the same time
    • in total a user can only allocate 60000 cores or 1875 nodes.
  • User Group limits:
    • limited number of jobs of users in the same group that can run at the same time
  • For CCM jobs the max. mppwidth is 256
  • Batch Queue limits of all user jobs:
    • single node job queue (single): max. 300 nodes in total
    • multi node job queue (multi): max. 3400 nodes in total
    • queue for ccm jobs (ccm_base): max. 1000 nodes in total