- Infos im HLRS Wiki sind nicht rechtsverbindlich und ohne Gewähr -
- Information contained in the HLRS Wiki is not legally binding and HLRS is not responsible for any damages that might result from its use -

NEC Cluster FAQ (laki + laki2)

From HLRS Platforms
Revision as of 13:46, 12 June 2013 by Hpcbk (talk | contribs) (Hpcbk moved page NEC Nehalem Cluster FAQ to NEC Cluster FAQ (laki + laki2))
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

System usage

What ist the minimal command to submit a job to the batch system?

  • Specify the number of nodes, the number of processor cores per node, the type of the nodes (probably 'nehalem') and the desired walltime.
    Example: 4 nodes, 8 processor cores per node for two hours
    qsub -l nodes=4:ppn=8:nehalem,walltime=2:00:00 ./myscript

How can I filter the queue to only show my jobs?

  • The commands showq and qsub are used to display job information. If there are many jobs in the queue it is more convenient to filter the data.
    showq -u <user-name>
    qstat -u <user-name>

Errors

  • mpirun: spawn failed with errno=-11
    using
    mpirun -np 2 -hostfile $PBS_NODEFILE ./test.out
    causes in combination with the openmpi module an error like
    [n110402:02618] pls:tm: failed to poll for a spawned proc, return status = 17002
    [n110402:02618] [0,0,0] ORTE_ERROR_LOG: In errno in file ../../../../../orte/mca/rmgr/urm/rmgr_urm.c at line 462
    [n110402:02618] mpirun: spawn failed with errno=-11
    
    You simply have to omit the -hostfile option.

Software Development

Where can I find documentation for compilers ?

Documentation for MPI libraries

Documentation for numerical libraries

Documentation for the batch system

  • The most important commands are qsub, qdel, qstat.
  • The command showstart is quite interesting for the impatient user.


Support

Development Support Question

I cannot build my application, it is not running, or I did nothing and it does not work anymore. Please help me.

If you need support to get your application working it is necessary to provide some information to get useful help.

If you have problems to build your application please provide the following information.

  • Used software modules (module list),
  • The calls to the tools like compiler or linker,
  • The output from these tools.

If you have problems during the excution of your application please provide the following information.

  • The command which you have used to submit your job to the execution queue (qsub ...),
  • Used software modules (module list),
  • The path where you have executed your program, command or script to execute the program, the information for input files.

In cases similar to that that you did not use your application for some time, made a recompilation (but did not change anything else) and run into problems, please recompile your application with a command like

 make ... 2>&1 | tee make.log

Please check the output carefully for warnings which could potentially be hints for problems. If this does not help, bundle this log together with the information mentioned above in your support question.