SLURM Overview
Slurm is for cluster management and job scheduling. All RSS clusters use Slurm. This document gives an overview of how to run jobs, check job status, and make changes to submitted jobs. To learn more about specific flags or commands please visit slurm's website.
RSS offers a training session about Slurm. Please check our Training to learn more.
All jobs must be run using srun
or sbatch
to prevent running on the Lewis
login node. Jobs that are running found running on the login node will be
immediately terminated followed up with a notification email to the user.
Interactive SLURM job
Interactive jobs are typically a few minutes. This a basic example of an
interactive job using srun
and -n
to use one cpu:
srun -n 1 hostname
An example of the command with the output:
[jgotberg@lewis4-r710-login-node223 ~]$ srun -n 1 hostname
lewis4-s2600tpr-hpc4-node249
We can use srun
for calling programs interactively:
srun -n 1 --mem 4G --pty /bin/bash
module load R
R
Calling an interactive program with a GUI using X11. Verify X11 forwarding is enabled or that your ssh session has the -Y flag.
To learn more about module load
please read Environment Modules User Guide.
Batch SLURM job
Batch jobs run multiple jobs and multiple tasks. They typically take a few hours
to a few days to complete. Most of the time you will use a SBATCH file to launch
your jobs. This examples shows how to put our SLURM options in the file
saving_the_world.sh
and then submit the job to the queue. To learn more about
the partitions available for use on Lewis and the specifics of each partition
please read our Partition Policy.
Listing of saving_the_world.sh
:
#! /bin/bash
#SBATCH -p Lewis # use the Lewis partition
#SBATCH -J saving_the_world # give the job a custom name
#SBATCH -o results-%j.out # give the job output a custom name
#SBATCH -t 0-02:00 # two hour time limit
#SBATCH -N 2 # number of nodes
#SBATCH -n 2 # number of cores (AKA tasks)
# Commands here run only on the first core
echo "$(hostname), reporting for duty."
# Commands with srun will run on all cores in the allocation
srun echo "Let's save the world!"
srun hostname
Once the SBATCH file is ready to go start the job with:
sbatch saving_the_world.sh
Output is found in the file results-<job id here>.out
. Example below:
[jgotberg@lewis4-r710-login-node223 training]$ sbatch saving_the_world.sh
Submitted batch job 844
[jgotberg@lewis4-r710-login-node223 training]$ cat results-844.out
lewis4-s2600tpr-hpc4-node248, reporting for duty.
Let's save the world!
Let's save the world!
lewis4-s2600tpr-hpc4-node248
lewis4-s2600tpr-hpc4-node249
Job Status
sacct
is a good practice to check the status of your jobs. The sacct
command has many options for the information it can provide. Below is the command to display the full list with the full list output.
man sacct
To check the status jobs use:
sacct
To find jobs were issued during February 2016 for instance:
sacct -S 2016-02-01 -E 2016-03-01
And to see custome fileds applied to the Feb 2016 report:
sacct -S 2016-02-01 -E 2016-03-01 -o JobID,JobName,AllocCPUS,Partition,NNodes
Output:
JobID JobName AllocCPUS Partition NNodes
------------ ---------- ---------- ---------- --------
188716 free 1 HighMem 1
188718 free 1 HighMem 1
188720 free 1 general 1
See expanded values with the job id and output formatting:
sacct -j 195939 -o User,Acc,AllocCPUS,Elaps,CPUTime,TotalCPU,AveDiskRead,AveDiskWrite,ReqMem,MaxRSS
Output:
User Account AllocCPUS Elapsed CPUTime TotalCPU AveDiskRead AveDiskWrite ReqMem MaxRSS
--------- ---------- ---------- ---------- ---------- ---------- -------------- -------------- ---------- ----------
rcss general 16 00:48:39 12:58:24 01:49.774 66.58M 44.75M 64Gn 216K
You can use -o All
to see all available fields. Available fields to choose from:
AllocCPUS AllocGRES AllocNodes AllocTRES
Account AssocID AveCPU AveCPUFreq
AveDiskRead AveDiskWrite AvePages AveRSS
AveVMSize BlockID Cluster Comment
ConsumedEnergy ConsumedEnergyRaw CPUTime CPUTimeRAW
DerivedExitCode Elapsed Eligible End
ExitCode GID Group JobID
JobIDRaw JobName Layout MaxDiskRead
MaxDiskReadNode MaxDiskReadTask MaxDiskWrite MaxDiskWriteNode
MaxDiskWriteTask MaxPages MaxPagesNode MaxPagesTask
MaxRSS MaxRSSNode MaxRSSTask MaxVMSize
MaxVMSizeNode MaxVMSizeTask MinCPU MinCPUNode
MinCPUTask NCPUS NNodes NodeList
NTasks Priority Partition QOS
QOSRAW ReqCPUFreq ReqCPUFreqMin ReqCPUFreqMax
ReqCPUFreqGov ReqCPUS ReqGRES ReqMem
ReqNodes ReqTRES Reservation ReservationId
Reserved ResvCPU ResvCPURAW Start
State Submit Suspended SystemCPU
Timelimit TotalCPU TRESAlloc TRESReq
UID User UserCPU WCKey
WCKeyID
Monitoring a running job
To view information about CPU, task, node, resident set size, and virtual memory
for your job use the sstat
command. The information provided from running
sstat
can help tuning of future jobs similar to the current job.
In order for this to work jobs using SBATCH must utilize the srun
command within the job file.
#!/bin/bash
#SBATCH -n 1
#SBATCH -J monitor
echo $HOSTNAME
srun dd if=/dev/urandom bs=4k count=40k | md5sum
Example output:
[user@lewis3]$ sbatch ./monitor.sh
Submitted batch job 214827
[user@lewis3]$ sstat 214827
JobID MaxVMSize MaxVMSizeNode MaxVMSizeTask AveVMSize MaxRSS ...
------------ ---------- -------------- -------------- ---------- ---------- ...
214827.0 205104K c13b-14 0 107940K 848K ...
Running Job Length
To view your status in the queue, if your job has started running, and how long
your just has been running for use the squeue
command. Running squeue
by
itself provides a list of all jobs submitted to the queue.
Common Reasons in Nodelist:
Reason | Meaning |
---|---|
Priority |
Job is waiting to run after higher priority jobs based on fairshare |
lewis4-hardware-partition-nodenumber |
Job is running on that particular node |
QOSGrpNodeLimit |
Job's QOS has reached its node limit |
PartitionTimeLimit |
Requested Time is greater than time allowed |
Resources |
Job is waiting for appropriate compute nodes to become available |
Example:
squeue -A *your_username_here*
If your job needs to run more than two days, please contact us to be added to the long QOS. We will set up a consultation to review your job and verify that no further tuning can be done.