Howto
This document will answer how to:
- Cite for Publication
- Generate a SSH Key Pair
- SSH Key Troubleshooting
- Transfer Data
- Check Storage Usage (Quotas)
- Check Fairshare and Accounts
- Find Quality of Services (QOS)
- Use GPU Resources
- Use Graphical User Interfaces (GUIs)
- Use Licensed Software
Cite for Publication
We ask that when you cite any of the RSS clusters in a publication to send an email to muitrss@missouri.edu as well as share a copy of the publication with us. To cite the use of any of the RSS clusters in a publication please use:
The computation for this work was performed on the high performance computing infrastructure provided by Research Support Solutions and in part by the National Science Foundation under grant number CNS-1429294 at the University of Missouri, Columbia MO. DOI: https://doi.org/10.32469/10355/69802
Generate a SSH Key Pair
For Windows, open MobaXterm and press start local terminal.
For macOS or Linux open the local terminal.
Then type the following command:
ssh-keygen
When prompted press enter to save the key in the default location
(/home/<username>/.ssh/id_rsa
) then enter a strong passphrase (required) twice.
After you generate your key you will need to display the public key and include it in the Account Request Form. Type the command:
cat ~/.ssh/id_rsa.pub
The output will be a unique string of random characters similar to this:
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC7iKBE2qsnnR+mguxvvKNyj/IJchBjba4QD39BGMCC
vytPxFOoN9h2WbeIg1cUSpv7sb6STGcrCXnSrthM4fuasFM/KXELnJobq0JBEd6pld72jhBVHzObomrs
ktGSThO3JqSgE2O0elxcfT/0dSn/6t+GY/HvkcgdFnenfW3oHTOdAyWgHXe/0aWSuq60DhLfJGp8mM1N
Ixjagd9s/OLByOSV7GxwUCNl+OD/CsaLkGgrB6jonf01cjfaFYh4iYcBX5s7lZuBCSpFY1+KqeE8ZZ8k
qDdMqgYOmw2SxFpST1iaC9wmL9N4J4Xm <your_user_name_here>@<your_computer_name_here>
We recommend that you copy your SSH key pair to a USB drive so you can log in to Lewis from more than one computer. Replace the the path below in the example to the actual path to your USB drive:
mkdir /some/path/to/usb_drive/ssh_key_backup/
cp ~/.ssh/id_rsa* /some/path/to/usb_drive/ssh_key_backup/
ls /some/path/to/usb_drive/ssh_key_backup/
SSH Key Troubleshooting
A common problem is Permission denied
. If you run ssh
and get Permission denied
please do not generate a new ssh key.
ssh <user-name>@lewis.rnet.missouri.edu
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
Make sure your ssh keys are under ~/.ssh
directory with right permissions by using ls -la ~/.ssh
. For example:
[user@local]$ ls -la ~/.ssh
total 30
drwx------. 2 user user 5 Apr 22 12:32 .
drwx------. 4 user user 45 Jul 16 15:00 ..
-rw-------. 1 user user 1896 Nov 12 2019 id_rsa
-rw-r--r--. 1 user user 414 Nov 12 2019 id_rsa.pub
-rw-r--r--. 1 user user 3970 Jun 28 11:01 known_hosts
If keys are missing, you should copy your keys in ~/.ssh
directory with right permissions (-rw-r--r--
). You can use chmod 644 id_rsa
to modify the permissions.
If you could not solve the issue, please run the following commands and contact us:
ssh -v <user-name>@lewis.rnet.missouri.edu
ls -la ~/.ssh
Passphrase issue
If you forget your passphrase you will need to generate a new ssh-key pair and send us your new public key by email to muitrss@missouri.edu.
If you lost your ssh passphrase, do not resubmit a Lewis Account Request.
Transfer Data
The data transfer partition (Dtn) includes cluster nodes that are designed for performing large file transfers. To transfer files to/from Lewis, you need to use one of data transfer nodes:
- DTN0:
lewis4-dtn.rnet.missouri.edu
- DTN1:
lewis4-dtn1.rnet.missouri.edu
Note that in order to use the Dtn partition, you must be part of the dtn
QOS. To find if you have access to Dtn, you can run the following in Lewis:
sacctmgr show assoc user=$USER format=acc,user,qos
If you could not find dtn
among your QOS list, please contact us.
Transfer from/to local
We recommend that you use the rsync
command with the following options to transfer files to Lewis:
rsync -rltvPh /source/path <username>@lewis4-dtn.rnet.missouri.edu:/destination/path
Here are what the various flags mean:
-r
: recursive-l
: links-t
: time-v
: verbose-P
: progress-h
: human readable
And the following to transfer files from Lewis:
rsync -rltvPh <username>@lewis4-dtn.rnet.missouri.edu:/source/path /destination/path
Note that you should run this command from a local terminal not from Lewis.
Transfer from/to server
To transfer files from/to a server to/from Lewis, you can submit a batch job on Lewis such that:
#!/bin/bash
######################### Batch Headers #########################
#SBATCH -p Dtn # use the data transfer partition
#SBATCH --qos dtn # use dtn QOS
#SBATCH -J dtn_transfer # give the job a custom name
#SBATCH -o results-%j.out # give the job output a custom name
#SBATCH -t 08:00:00 # give the job a time (up to one month: 28-00:00:00)
#################################################################
# Set variables
REMOTEUSERNAME="buzz" # Username on the server to download from
REMOTEHOST="lightyear.edu" # Hostname of the server to download from
REMOTEPATH="/home/dataset" # Remote data location
REMOTE="$REMOTEUSERNAME@$REMOTEHOST:$REMOTEPATH"
LOCAL="/storage/hpc/location/" # Local data location
# Do the file transfer
rsync -rltvPh $REMOTE $LOCAL # Download dataset
# To upload data, it would be the opposite (rsync -rltvPh $LOCAL $REMOTE)
Check Storage Usage (Quotas)
You can check your quota and current usage with the following commands:
- Home Storage (/home/$USER):
df -h /home/$USER
- HTC Storage (/storage/htc):
df -h /storage/htc/<folder_name_here>
- GPRS Storage (/gprs):
df -h /gprs/<folder_name_here>
- HPC Storage per User (/data, /scratch):
lfs quota -hg $USER /storage/hpc
- HPC Storage per Group (/group):
lfs quota -hg <group_name_here> /storage/hpc
Hint: Use the id
command to see the full names of your groups.
Note: If your files do not have the group id in the permissions field it will count against your user quota. Use ls -l
to check:
# group quota
$ ls -l testfile_1
-rw-rw-r--. 1 user my-group 0 Jan 12 17:29 testfile_1
# user quota
$ ls -l testfile_2
-rw-rw-r--. 1 user user 0 Jan 12 17:29 testfile_2
You can use chown -R $USER:<group_name_here> <your-directory>
to change group owner of your files in a directory. For example:
$ chown -R user:my-group testfile_2
$ ls -l testfile_2
-rw-rw-r--. 1 user my-group 0 Jan 12 17:30 testfile_2
Note: The RSS team reserves the right to delete anything in /scxratch
and /local/scratch
at any time for any reason.
Check Fairshare and Accounts
Using RSS clusters are free of charge for general users. But each month users (accounts) have a certain amount of "cash" to use the resources that we call "fairshare". When users request resources, they spend their fairshare to receive resources until the value reaches zero. Note that it would take up to 28 days to get your fairshare back.
Your fairshare is a number between 0 and 1. All general accounts has a same fairshare of 0.55 at the beginning of the cycle. Whenever your jobs are in the queue for receiving resources, users with higher fairshares have a higher priority and will get resources faster.
If your fairshare is too low, it means you have used the cluster more than your fair share and will be de-prioritized by the queuing software. In this case if you belong to multiple accounts, you can shift to the other account with more available fairshare.
To check your fairshare and find your accounts run:
sshare -U
For instance, "rcss" user has 4 accounts with the following fairshares:
Account User RawShares NormShares RawUsage EffectvUsage FairShare
-------------------- ---------- ---------- ----------- ----------- ------------- ----------
rcss rcss 1 0.001816 93038 0.001438 0.577615
general rcss 1 0.000132 1974283 0.000223 0.311102
rcss-gpu rcss 1 0.000172 0 0.000000 0.998035
general-gpu rcss 1 0.000345 30 0.000242 0.614355
You can specify your account by using --account <your-account>
in srun
or #SBATCH --account <your-account>
in sbatch
commands.
To be able to run more jobs with your fairshare, you should increase your jobs efficiency by requesting resources based on your jobs' need. You can learn how to check your jobs efficiency in here.
Users can increase their fairshare by investment review here to learn more.
Find Quality of Services (QOS)
Quality of services (QOS) are state of RSS services for different workflows. For example, all general accounts are part of "dtn" and "interactive" QOS to be able to access data transfer partition (Dtn) or Interactive partition. To find RSS quality of services (QOS) run the following in the cluster:
sacctmgr show qos format=name%20,maxwall,grpjobs,maxjob,maxsubmit,grptres,maxtres,maxtrespu
The output is similar to:
Name MaxWall GrpJobs MaxJobs MaxSubmit GrpTRES MaxTRES MaxTRESPU
---------------- ----------- ------- ------- --------- ---------- ------------- ------------------------------
normal 2000 license/matlab=8,license/sas=2
long 7-00:00:00 500 node=15
biolong 7-00:00:00 2000 node=20
hdfslong 365-00:00:00
seriallong 28-00:00:00 60 4 24 cpu=1 mem=256G
dtn 28-00:00:00 4 24 cpu=1 mem=16G
gpu4 2000
gpu-investor-28d 28-00:00:00 node=1
manyjobs
interactive 25 25
gpu-investor-7d 7-00:00:00 3 1 gres/gpu=1
In order to find QOS related to your accounts you can run:
sacctmgr show assoc user=$USER format=cluster,account,user,share,maxjob,maxsubmit,qos%50
For instance, the following shows QOS related to each account for "rcss" user:
Cluster Account User Share MaxJobs MaxSubmit QOS
---------- ----------- ---------- --------- ------- --------- ---------------------------------------
lewis4 general-gpu rcss 1 dtn,interactive,normal
lewis4 rcss-gpu rcss 1 dtn,interactive,normal
lewis4 rcss rcss 1 dtn,interactive,long,normal,seriallong
lewis4 general rcss 1 24 dtn,interactive,long,normal,seriallong
You can specify your QOS by using --qos <your-qos>
in srun
or #SBATCH --qos <your-qos>
in sbatch
commands.
Use GPU Resources
Use of GPU resources for longer than two hours only in a specific POSIX group to function. All GPU usage is subject to the GPU Partition Policy.
To list available GPUs, use the following command:
sinfo -p Gpu -o %n,%G
Output:
HOSTNAMES,GRES
lewis4-r730-gpu3-node426,gpu:Tesla K20Xm:1
lewis4-r730-gpu3-node428,gpu:Tesla K20Xm:1
lewis4-r730-gpu3-node429,gpu:Tesla K40m:1
lewis4-r730-gpu3-node430,gpu:Tesla K40m:1
lewis4-r730-gpu3-node431,gpu:Tesla K20Xm:1
lewis4-r730-gpu3-node432,gpu:Tesla K20Xm:1
lewis4-r730-gpu3-node434,gpu:Tesla K20Xm:1
lewis4-r730-gpu3-node435,gpu:Tesla K20Xm:1
lewis4-r730-gpu3-node476,gpu:Tesla K20Xm:1
lewis4-z10pg-gpu3-node599,gpu:GeForce GTX 1080 Ti:4
lewis4-z10pg-gpu3-node600,gpu:GeForce GTX 1080 Ti:3
lewis4-z10pg-gpu3-node601,gpu:GeForce GTX 1080 Ti:4
lewis4-r730-gpu3-node687,gpu:Tesla P100-PCIE-12GB:1
lewis4-r740xd-gpu4-node887,gpu:Tesla V100-PCIE-32GB:3
lewis4-r740xd-gpu4-node888,gpu:Tesla V100-PCIE-32GB:3
lewis4-r740xd-gpu4-node913,gpu:Tesla V100-PCIE-32GB:3
And to see available GPU and CPU resources run:
/group/training/hpc-intro/alias/ncpu.py
To use GPU resources, we need to request generic GPU resources (gres gpu:number
, number is number of required GPUs) in one of GPU partitions (Gpu, gpu3 and gpu4). Enter the following to use Gpu partitions interactively:
srun --partition Gpu --gres gpu:1 --ntasks-per-node 4 --mem 16G --nodes 1 --pty /bin/bash
Note that in the above line, we have requested 1 GPU, 4 CPU and 16G memory on 1 node on Gpu partition.
For submitting a job on Gpu partition, we can use the following options:
#!/bin/bash
######################### Batch Headers #########################
#SBATCH --partition Gpu # use partition Gpu
#SBATCH --gres gpu:1 # request generic GPU resources
#SBATCH --ntasks-per-node 4 # number of tasks(CPUs)
#SBATCH --mem 16G # memory
#SBATCH --nodes 1 # number of nodes
Note that Partition gpu4 is only available for GPU investors, to use gpu4 --partition gpu4
, you should add corresponding account's name by adding --account <account-name>
to the srun/sbatch options.
Use Graphical User Interfaces (GUIs)
Lewis utilizes the X Windows System for Graphical User Interfaces (GUIs). Codes that support software rendering with X Windows will be able to forward the GUI to the end user via SSH. This is known as 'X11 Forwarding' or 'X Forwarding in SSH'.
- Windows:
- using MobaXterm: X11 forwarding is automatically enabled
- using some other terminal emulator: RSS only supports MobaXterm
- Linux: invoke the
-YC
switches in your ssh commandssh -YC username@lewis.rnet.missouri.edu
- macOS
- Install XQuartz
- Reboot
- Start a terminal. Use
-YC
switch to your ssh command when connecting to Lewis:ssh -YC <username>@lewis.rnet.missouri.edu
Note: The first time you run X11 Forwarding you may see the following warning: /usr/bin/xauth: file /home/<username>/.Xauthority does not exist
.
This is expected the first time you use the -YC
flag. If it appears on subsequent connections to Lewis it may mean something is not functioning correctly.
Use Licensed Software
Use of any Licensed Software is subject to the Software Policy and the respective EULA of each product. You may email muitrss@missouri.edu if you have questions about a particular license.
The Lewis cluster uses SLURM to monitor and enforce licensed software limits.
If you are using a software product that has a limited number of concurrent
users or nodes (e.g. MATLAB, SAS) you must use the SLURM --licenses
switch
inside of your SBATCH script or srun
command. Failure to do so will result
in an error message like the one below:
ERROR: You must request a license to run matlab
example: srun --licenses=matlab ...
If you have any questions, please contact muitrss@missouri.edu
To request a license for a specific tool add the following line to your SBATCH script:
#SBATCH --licenses=<software>:<qty>
Example:
#SBATCH --licenses=matlab:1
To see what licenses are available:
scontrol show licenses
See the documentation page for your particular software package for a complete SBATCH file with licensing example.