|
The ACM headnode serves many purposes including being the gateway, file server, subnet manager, and license manager for the cluster. Thus, it is essential that users do NOT to clog it down with resource-intensive tasks. Rather, resource-intensive jobs should be submitted through SGE, ACM's preferred queuing system. On ACM, SGE is configured with two queues: ACM_Tesla and ACM_GTS, representing the two architectures. Both of these queues support batch and interactive logins. The ACM cluster also participates in the Condor opportunistic scheduling system. Condor jobs can be submitted from the ACM headnode and will be executed as resources become available.
Example: Submitting Batch Jobs
Batch jobs are submitted using the qsub command to submit to the appropriate queue. At the time of this writing the current list of SGE queues are:
acm_gts for machines with the 8500 GTS cards
acm_tesla for machines with the single-precision Tesla cards
acm_tesla_double for machines with the double-precision Tesla cards (when available)
For example, to submit a serial job to the acm_tesla queue:
qsub -q acm_tesla myjob.sh
To submit parallel MPI jobs, it is necessary to specify a parallel environment. For a list of available parallel environments type:
qconf -spl
At the time of this writing the current list of parallel environments are:
make
make_gts_fu
make_gts_rr
make_tesla_fu
make_tesla_rr
These environments allow a user to specify a preferred machine architecture and slot assignment strategy. The make_gts* and make_tesla* environments assign slots on the nodes with GTS cards or Tesla cards, respectively. The *rr (round robin) environments attempt to distribute jobs uniformly across compute nodes, while the *fu (fill-up) environments attempt to saturate each node in turn.
The make (default) environment includes all machines (both Tesla and GTS nodes) and assigns jobs in a round robin fashion.
To submit a job to a parallel environment you need to add the -pe flags to the qsub statement.
qsub -pe [pe_name] [numslots] [jobname]
For example, to submit a job called myjob.sh to the make_gts_fu environment with 16 processes you would type:
qsub -pe make_gts_fu 16 myjob.sh
Example: Interactive Login
Please refrain from directly accessing nodes to test code or run jobs using ssh. Instead, use the interactive login feature of qlogin so the scheduler can manage resources optimally.
At the time of this writing the current list of SGE queues are:
acm_gts for machines with the 8500 GTS cards
acm_tesla for machines with the single-precision Tesla cards
acm_tesla_double for machines with the double-precision Tesla cards (when available)
For example, if you would like a session on a machine with a Tesla GPU, type:
qlogin -q acm_tesla |