Howto

This document will answer how to:


Cite for Publication

We ask that when you cite any of the RCSS clusters in a publication to send an email to mudoitrcss@missouri.edu as well as share a copy of the publication with us. To cite the use of any of the RCSS clusters in a publication please use:

The computation for this work was performed on the high performance computing infrastructure provided by Research Computing Support Services and in part by the National Science Foundation under grant number CNS-1429294 at the University of Missouri, Columbia MO. DOI: https://doi.org/10.32469/10355/69802


Generate a SSH Key Pair

For Windows, open MobaXterm and press start local terminal.

Mobaxterm

For macOS or Linux open the local terminal.

Then type the following command:

ssh-keygen

When prompted press enter to save the key in the default location (/home/<username>/.ssh/id_rsa) then enter a strong passphrase (required) twice.

After you generate your key you will need to display the public key and include it in the Account Request Form. Type the command:

cat ~/.ssh/id_rsa.pub

The output will be a unique string of random characters similar to this:

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC7iKBE2qsnnR+mguxvvKNyj/IJchBjba4QD39BGMCC
vytPxFOoN9h2WbeIg1cUSpv7sb6STGcrCXnSrthM4fuasFM/KXELnJobq0JBEd6pld72jhBVHzObomrs
ktGSThO3JqSgE2O0elxcfT/0dSn/6t+GY/HvkcgdFnenfW3oHTOdAyWgHXe/0aWSuq60DhLfJGp8mM1N
Ixjagd9s/OLByOSV7GxwUCNl+OD/CsaLkGgrB6jonf01cjfaFYh4iYcBX5s7lZuBCSpFY1+KqeE8ZZ8k
qDdMqgYOmw2SxFpST1iaC9wmL9N4J4Xm <your_user_name_here>@<your_computer_name_here>

We recommend that you copy your SSH key pair to a USB drive so you can log in to Lewis from more than one computer. Replace the the path below in the example to the actual path to your USB drive:

mkdir /some/path/to/usb_drive/ssh_key_backup/
cp ~/.ssh/id_rsa* /some/path/to/usb_drive/ssh_key_backup/
ls  /some/path/to/usb_drive/ssh_key_backup/

SSH Key Troubleshooting

A common problem is Permission denied. If you run ssh and get Permission denied please do not generate a new ssh key.

ssh <user-name>@lewis.rnet.missouri.edu
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

Make sure your ssh keys are under ~/.ssh directory with right permissions by using ls -la ~/.ssh. For example:

[user@local]$ ls -la ~/.ssh
total 30
drwx------.  2 user user    5 Apr 22 12:32 .
drwx------.  4 user user   45 Jul 16 15:00 ..
-rw-------.  1 user user 1896 Nov 12  2019 id_rsa
-rw-r--r--.  1 user user  414 Nov 12  2019 id_rsa.pub
-rw-r--r--.  1 user user 3970 Jun 28 11:01 known_hosts

If keys are missing, you should copy your keys in ~/.ssh directory with right permissions (-rw-r--r--). You can use chmod 644 id_rsa to modify the permissions.

If you could not solve the issue, please run the following commands and contact us:

ssh -v <user-name>@lewis.rnet.missouri.edu
ls -la ~/.ssh

Passphrase issue

If you forget your passphrase you will need to generate a new ssh-key pair and send us your new public key by email to mudoitrcss@missouri.edu.

If you lost your ssh passphrase, do not resubmit a Lewis Account Request.


Transfer Data

The data transfer partition (Dtn) includes cluster nodes that are designed for performing large file transfers. To transfer files to/from Lewis, you need to use one of data transfer nodes:

  • DTN0: lewis4-dtn.rnet.missouri.edu
  • DTN1: lewis4-dtn1.rnet.missouri.edu

Note that in order to use the Dtn partition, you must be part of the dtn QOS. To find if you have access to Dtn, you can run the following in Lewis:

sacctmgr show assoc user=$USER format=acc,user,qos

If you could not find dtn among your QOS list, please contact us.

Transfer from/to local

We recommend that you use the rsync command with the following options to transfer files to Lewis:

rsync -rltvPh /source/path <username>@lewis4-dtn.rnet.missouri.edu:/destination/path

Here are what the various flags mean:

  • -r: recursive
  • -l: links
  • -t: time
  • -v: verbose
  • -P: progress
  • -h: human readable

And the following to transfer files from Lewis:

rsync -rltvPh <username>@lewis4-dtn.rnet.missouri.edu:/source/path /destination/path

Note that you should run this command from a local terminal not from Lewis.

Transfer from/to server

To transfer files from/to a server to/from Lewis, you can submit a batch job on Lewis such that:

#!/bin/bash
######################### Batch Headers #########################
#SBATCH -p Dtn              # use the data transfer partition
#SBATCH --qos dtn           # use dtn QOS
#SBATCH -J dtn_transfer     # give the job a custom name
#SBATCH -o results-%j.out   # give the job output a custom name
#SBATCH -t 08:00:00         # give the job a time (up to one month: 28-00:00:00)
#################################################################

# Set variables
REMOTEUSERNAME="buzz"                             # Username on the server to download from
REMOTEHOST="lightyear.edu"                        # Hostname of the server to download from
REMOTEPATH="/home/dataset"                        # Remote data location
REMOTE="$REMOTEUSERNAME@$REMOTEHOST:$REMOTEPATH"
LOCAL="/storage/hpc/location/"                    # Local data location

# Do the file transfer
rsync -rltvPh $REMOTE $LOCAL                      # Download dataset
# To upload data, it would be the opposite (rsync -rltvPh $LOCAL $REMOTE)

Check Storage Usage (Quotas)

You can check your quota and current usage with the following commands:

  • Home Storage (/home/$USER): df -h /home/$USER
  • HTC Storage (/storage/htc): df -h /storage/htc/<folder_name_here>
  • GPRS Storage (/gprs): df -h /gprs/<folder_name_here>
  • HPC Storage per User (/data, /scratch): lfs quota -hg $USER /storage/hpc
  • HPC Storage per Group (/group): lfs quota -hg <group_name_here> /storage/hpc

Hint: Use the id command to see the full names of your groups.

Note: If your files do not have the group id in the permissions field it will count against your user quota. Use ls -l to check:

# group quota
$ ls -l testfile_1
-rw-rw-r--. 1 user my-group 0 Jan 12 17:29 testfile_1

# user quota
$ ls -l testfile_2
-rw-rw-r--. 1 user user 0 Jan 12 17:29 testfile_2

You can use chown -R $USER:<group_name_here> <your-directory> to change group owner of your files in a directory. For example:

$ chown -R user:my-group testfile_2
$ ls -l testfile_2
-rw-rw-r--. 1 user my-group 0 Jan 12 17:30 testfile_2

Note: The RCSS team reserves the right to delete anything in /scxratch and /local/scratch at any time for any reason.


Check Fairshare and Accounts

Using RCSS clusters are free of charge for general users. But each month users (accounts) have a certain amount of "cash" to use the resources that we call "fairshare". When users request resources, they spend their fairshare to receive resources until the value reaches zero. Note that it would take up to 28 days to get your fairshare back.

Your fairshare is a number between 0 and 1. All general accounts has a same fairshare of 0.55 at the beginning of the cycle. Whenever your jobs are in the queue for receiving resources, users with higher fairshares have a higher priority and will get resources faster.

If your fairshare is too low, it means you have used the cluster more than your fair share and will be de-prioritized by the queuing software. In this case if you belong to multiple accounts, you can shift to the other account with more available fairshare.

To check your fairshare and find your accounts run:

sshare -U

For instance, "rcss" user has 4 accounts with the following fairshares:

             Account       User  RawShares  NormShares    RawUsage  EffectvUsage  FairShare 
-------------------- ---------- ---------- ----------- ----------- ------------- ---------- 
rcss                      rcss          1    0.001816       93038      0.001438   0.577615 
general                   rcss          1    0.000132     1974283      0.000223   0.311102 
rcss-gpu                  rcss          1    0.000172           0      0.000000   0.998035 
general-gpu               rcss          1    0.000345          30      0.000242   0.614355 

You can specify your account by using --account <your-account> in srun or #SBATCH --account <your-account> in sbatch commands.

To be able to run more jobs with your fairshare, you should increase your jobs efficiency by requesting resources based on your jobs' need. You can learn how to check your jobs efficiency in here.

Users can increase their fairshare by investment review here to learn more.


Find Quality of Services (QOS)

Quality of services (QOS) are state of RCSS services for different workflows. For example, all general accounts are part of "dtn" and "interactive" QOS to be able to access data transfer partition (Dtn) or Interactive partition. To find RCSS quality of services (QOS) run the following in the cluster:

sacctmgr show qos format=name%20,maxwall,grpjobs,maxjob,maxsubmit,grptres,maxtres,maxtrespu

The output is similar to:

            Name     MaxWall GrpJobs MaxJobs MaxSubmit    GrpTRES       MaxTRES                      MaxTRESPU 
---------------- ----------- ------- ------- --------- ---------- ------------- ------------------------------ 
          normal                                  2000                          license/matlab=8,license/sas=2 
            long  7-00:00:00                       500    node=15                                              
         biolong  7-00:00:00                      2000    node=20                                              
        hdfslong 365-00:00:00                                                                                  
      seriallong 28-00:00:00      60       4        24                    cpu=1                       mem=256G 
             dtn 28-00:00:00               4        24                    cpu=1                        mem=16G 
            gpu4                                  2000                                                         
gpu-investor-28d 28-00:00:00                               node=1                                              
        manyjobs                                                                                               
     interactive                          25        25                                                         
 gpu-investor-7d  7-00:00:00       3       1                         gres/gpu=1                                

In order to find QOS related to your accounts you can run:

sacctmgr show assoc user=$USER format=cluster,account,user,share,maxjob,maxsubmit,qos%50

For instance, the following shows QOS related to each account for "rcss" user:

   Cluster     Account       User     Share MaxJobs MaxSubmit                                     QOS 
---------- ----------- ---------- --------- ------- --------- --------------------------------------- 
    lewis4 general-gpu       rcss         1                                    dtn,interactive,normal 
    lewis4    rcss-gpu       rcss         1                                    dtn,interactive,normal 
    lewis4        rcss       rcss         1                    dtn,interactive,long,normal,seriallong 
    lewis4     general       rcss         1      24            dtn,interactive,long,normal,seriallong

You can specify your QOS by using --qos <your-qos> in srun or #SBATCH --qos <your-qos> in sbatch commands.


Use GPU Resources

Use of GPU resources for longer than two hours only in a specific POSIX group to function. All GPU usage is subject to the GPU Partition Policy.

To list available GPUs, use the following command:

sinfo -p Gpu -o %n,%G

Output:

HOSTNAMES,GRES
lewis4-r730-gpu3-node426,gpu:Tesla K20Xm:1
lewis4-r730-gpu3-node428,gpu:Tesla K20Xm:1
lewis4-r730-gpu3-node429,gpu:Tesla K40m:1
lewis4-r730-gpu3-node430,gpu:Tesla K40m:1
lewis4-r730-gpu3-node431,gpu:Tesla K20Xm:1
lewis4-r730-gpu3-node432,gpu:Tesla K20Xm:1
lewis4-r730-gpu3-node434,gpu:Tesla K20Xm:1
lewis4-r730-gpu3-node435,gpu:Tesla K20Xm:1
lewis4-r730-gpu3-node476,gpu:Tesla K20Xm:1
lewis4-z10pg-gpu3-node599,gpu:GeForce GTX 1080 Ti:4
lewis4-z10pg-gpu3-node600,gpu:GeForce GTX 1080 Ti:3
lewis4-z10pg-gpu3-node601,gpu:GeForce GTX 1080 Ti:4
lewis4-r730-gpu3-node687,gpu:Tesla P100-PCIE-12GB:1
lewis4-r740xd-gpu4-node887,gpu:Tesla V100-PCIE-32GB:3
lewis4-r740xd-gpu4-node888,gpu:Tesla V100-PCIE-32GB:3
lewis4-r740xd-gpu4-node913,gpu:Tesla V100-PCIE-32GB:3

And to see available GPU and CPU resources run:

/group/training/hpc-intro/alias/ncpu.py

To use GPU resources, we need to request generic GPU resources (gres gpu:number, number is number of required GPUs) in one of GPU partitions (Gpu, gpu3 and gpu4). Enter the following to use Gpu partitions interactively:

srun --partition Gpu --gres gpu:1 --ntasks-per-node 4 --mem 16G --nodes 1 --pty /bin/bash

Note that in the above line, we have requested 1 GPU, 4 CPU and 16G memory on 1 node on Gpu partition.

For submitting a job on Gpu partition, we can use the following options:

#!/bin/bash
######################### Batch Headers #########################
#SBATCH --partition Gpu        # use partition Gpu
#SBATCH --gres gpu:1           # request generic GPU resources
#SBATCH --ntasks-per-node 4    # number of tasks(CPUs)
#SBATCH --mem 16G              # memory
#SBATCH --nodes 1              # number of nodes 

Note that Partition gpu4 is only available for GPU investors, to use gpu4 --partition gpu4, you should add corresponding account's name by adding --account <account-name> to the srun/sbatch options.


Use Graphical User Interfaces (GUIs)

Lewis utilizes the X Windows System for Graphical User Interfaces (GUIs). Codes that support software rendering with X Windows will be able to forward the GUI to the end user via SSH. This is known as 'X11 Forwarding' or 'X Forwarding in SSH'.

  • Windows:
    • using MobaXterm: X11 forwarding is automatically enabled
    • using some other terminal emulator: RCSS only supports MobaXterm
  • Linux: invoke the -YC switches in your ssh command
    • ssh -YC username@lewis.rnet.missouri.edu
  • macOS
    1. Install XQuartz
    2. Reboot
    3. Start a terminal. Use -YC switch to your ssh command when connecting to Lewis:
      • ssh -YC <username>@lewis.rnet.missouri.edu

Note: The first time you run X11 Forwarding you may see the following warning: /usr/bin/xauth: file /home/<username>/.Xauthority does not exist. This is expected the first time you use the -YC flag. If it appears on subsequent connections to Lewis it may mean something is not functioning correctly.


Use Licensed Software

Use of any Licensed Software is subject to the Software Policy and the respective EULA of each product. You may email mudoitrcss@missouri.edu if you have questions about a particular license.

The Lewis cluster uses SLURM to monitor and enforce licensed software limits. If you are using a software product that has a limited number of concurrent users or nodes (e.g. MATLAB, SAS) you must use the SLURM --licenses switch inside of your SBATCH script or srun command. Failure to do so will result in an error message like the one below:

ERROR: You must request a license to run matlab
       example: srun --licenses=matlab ...
       If you have any questions, please contact mudoitrcss@missouri.edu

To request a license for a specific tool add the following line to your SBATCH script:

#SBATCH --licenses=<software>:<qty>

Example:

#SBATCH --licenses=matlab:1

To see what licenses are available:

scontrol show licenses

See the documentation page for your particular software package for a complete SBATCH file with licensing example.