Job Scheduling
The acm headnode serves a great many purposes including being the file server, subnet manager, gateway and license manager. This means its important not to clog it down so jobs should only be compiled on the head node and no jobs should be run on the head node. If you need to run jobs please submit them via SGE to the appropriate queue.
The acm cluster is currently configured to use SGE for its immediate queueing system. The cluster has been divided into two sets of queues
acm_tesla and
acm_gts representing the two architectures. Both of these sets of queues support both batch and interactive logins.
The acm cluster also participates in the condor opportunistic scheduling system. Condor jobs can be submitted from the acm head node and will be scheduled to run on resources as they become available
Batch submission
If you have a scriptable job that you would like to run in batch mode you can use the
qsub feature to queue the jobs up, you should also specify which queue to would like your job submitted to i.e.
qsub -q acm_tesla myjob.sh
In order to submit MPI jobs you will need to specify a parallel environment to use and how many slots you are going to request. To so a list of parallel environments type
qconf -spl
At the time of this writing the current list of parallel environments are
- make
- make_gts_fu
- make_gts_rr
- make_tesla_fu
- make_tesla_rr
These are meant to provide selectivity to the machines and the way that the slots are assigned. make_gts and make_tesla will only put jobs on the nodes with gts cards or tesla cards respectively. The rr environments assign the slots in a round robin fashion trying to put one job on each node, while the fu assigns slots in a fill up fashion where it will try to put as many slots on the machine until that machine is full before going on to the next one. make is an default parallel environment provided with SGE which includes all machines (both tesla and gts nodes) and assigns jobs in a round robin fashion.
To submit a job to a parallel environment you need to add the pe flag to the qsub statement.
qsub -pe <pe_name> <numslots> <jobname>
for example to submit a job called myjob.sh to the make_gts_fu environment with 16 processes you would type
qsub -pe make_gts_fu 16 myjob.sh
Interactive Login
Please refrain from sshing around nodes to test code or run jobs, but instead use the interactive login feature so that the scheduler can take account of the resources used. If you would like a session on a tesla hardware machine you would type
qlogin -q acm_tesla
Further Reading
Please check out our tech series on the SGE scheduler to better learn commands associated with the provided scheduler.
Also information on condor can be found in our tech series as well as at the condor homepage (
http://