CUDA
CUDA® is a parallel computing platform and programming model invented by NVIDIA. It enables dramatic increases in computing performance by harnessing the power of the graphics processing unit (GPU). With millions of CUDA-enabled GPUs sold to date, software developers, scientists and researchers are using GPU-accelerated computing for broad-ranging applications.
- Software URL: https://developer.nvidia.com/cuda-zone
- Documentation: http://docs.nvidia.com/cuda/index.html
Usage
There are various partitions that have GPUs on Lewis. We can see their status
by filtering the sinfo
command like so:
Command:
sinfo|grep -i gpu
Output:
[user@lewis4-r710-login-node223 ~]$ sinfo | grep -i gpu
r730-gpu3 up 2-00:00:00 10 idle lewis4-r730-gpu3-node[426,428-435,476]
gpu3 up 2-00:00:00 10 idle lewis4-r730-gpu3-node[426,428-435,476]
Gpu up 2-00:00:00 10 idle lewis4-r730-gpu3-node[426,428-435,476]
To get more information on what GPUs are available, you can use sinfo
as follows:
Command:
sinfo -p Gpu -o %n,%G
Output:
[user@lewis4-r710-login-node223 ~]$ sinfo -p Gpu -o %n,%G
HOSTNAMES,GRES
lewis4-r730-gpu3-node426,gpu:Tesla K20Xm:1
lewis4-r730-gpu3-node428,gpu:Tesla K20Xm:1
lewis4-r730-gpu3-node429,gpu:Tesla K40m:1
lewis4-r730-gpu3-node430,gpu:Tesla K40m:1
lewis4-r730-gpu3-node431,gpu:Tesla K20Xm:1
lewis4-r730-gpu3-node432,gpu:Tesla K20Xm:1
lewis4-r730-gpu3-node433,gpu:Tesla K20Xm:1
lewis4-r730-gpu3-node434,gpu:Tesla K20Xm:1
lewis4-r730-gpu3-node435,gpu:Tesla K20Xm:1
lewis4-r730-gpu3-node476,gpu:Tesla K20Xm:1
To get started with CUDA we will first need to request one of the GPU nodes
from the cluster. We will use the Gpu
partition in this example.
Command:
srun -p Gpu -N1 -n20 -t 0-02:00 --mem=100G --gres gpu:1 --pty /bin/bash
Output:
[user@lewis4-r710-login-node223 ~]$ srun -p Gpu -N1 -n20 -t 0-02:00 --mem=100G --gres gpu:1 --pty /bin/bash
[user@lewis4-r730-gpu3-node428 ~]$
Notice how the prompt changed? We are now working on a node with a GPU.
Let's find out more about our GPU with the nvidia-smi
command:
nvidia-smi
Output:
[user@lewis4-r730-gpu3-node427 training]$ nvidia-smi
Fri Feb 10 16:03:18 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48 Driver Version: 367.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K20Xm Off | 0000:03:00.0 Off | 0 |
| N/A 25C P0 62W / 235W | 0MiB / 5699MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Now we will load the CUDA module and run a simple test program.
Command:
module load cuda/cuda-7.5
(No output unless there is an error)
We can double check that the module loaded properly with these commands:
module list
which nvcc
Output:
[user@lewis4-r730-gpu3-node427 training]$ module list
Currently Loaded Modulefiles:
1) cuda/cuda-7.5
[user@lewis4-r730-gpu3-node427 training]$ which nvcc
/usr/local/cuda-7.5/bin/nvcc
With our module loaded we can now try building and running a simple example.
Using your favorite text editor create a file and paste in the example
code below. Save it with the name: hello_cuda.cu
Now we are ready to
compile the code with nvcc
:
nvcc hello_cuda.cu -o hello_cuda
(No output unless there is an error)
We can check that the compiled binary is there with ls
:
[user@lewis4-r730-gpu3-node427 cuda]$ nvcc hello_cuda.cu -o hello_cuda
[user@lewis4-r730-gpu3-node427 cuda]$ ls
hello_cuda hello_cuda.cu
Now it is time to execute our code:
./hello_cuda
Output:
[user@lewis4-r730-gpu3-node427 cuda]$ ./hello_cuda
Hello World!
NOTE
If your output is
Hello Hello
that indicates a GPU was not used.
Success! Now the last step is leave our srun
session and use sbatch
to
launch our example. Type exit
into your prompt and notice that we are
taken back to the login node:
exit
Output:
[user@lewis4-r730-gpu3-node427 cuda]$ exit
exit
[user@lewis4-r710-login-node223 training]$
Create another file called cuda_sbatch.sh
and paste in the code below.
Now we can execute our CUDA example using sbatch
:
sbatch cuda_sbatch.sh
Output:
[user@lewis4-r710-login-node223 cuda]$ sbatch cuda_sbatch.sh
Submitted batch job 530035
To see the output we look for the result file that matches our job id (in this
example it is 530035) and use the cat
command:
[user@lewis4-r710-login-node223 cuda]$ ls
cuda_sbatch.sh hello_cuda hello_cuda.cu results_cuda-530035.out
[user@lewis4-r710-login-node223 cuda]$ cat results_cuda-530035.out
### Starting at: Fri Feb 10 16:45:05 CST 2017 ###
Currently Loaded Modulefiles:
1) cuda/cuda-7.5
### Starting at: Fri Feb 10 16:45:05 CST 2017
First core reporting from node:
lewis4-r730-gpu3-node427
Currently working in directory:
/home/user/training/cuda
Files in this folder:
total 854
-rw-rw-r--. 1 user group 1342 Feb 10 16:44 cuda_sbatch.sh
-rwxrwxr-x. 1 user group 530368 Feb 10 16:30 hello_cuda
-rw-rw-r--. 1 user group 954 Feb 10 11:30 hello_cuda.cu
-rw-rw-r--. 1 user group 285 Feb 10 16:45 results_cuda-530035.out
Hello World!
### Ending at: Fri Feb 10 16:45:07 CST 2017 ###
[user@lewis4-r710-login-node223 cuda]$
Notice that we get the same result, but now we don't have to be logged in directly to a GPU node.
Contents of Example Files
hello_cuda.cu
:
// This is the REAL "hello world" for CUDA!
// It takes the string "Hello ", prints it, then passes it to CUDA with an array
// of offsets. Then the offsets are added in parallel to produce the string "World!"
// By Ingemar Ragnemalm 2010
#include <stdio.h>
const int N = 7;
const int blocksize = 7;
__global__
void hello(char *a, int *b)
{
a[threadIdx.x] += b[threadIdx.x];
}
int main()
{
char a[N] = "Hello ";
int b[N] = {15, 10, 6, 0, -11, 1, 0};
char *ad;
int *bd;
const int csize = N*sizeof(char);
const int isize = N*sizeof(int);
printf("%s", a);
cudaMalloc( (void**)&ad, csize );
cudaMalloc( (void**)&bd, isize );
cudaMemcpy( ad, a, csize, cudaMemcpyHostToDevice );
cudaMemcpy( bd, b, isize, cudaMemcpyHostToDevice );
dim3 dimBlock( blocksize, 1 );
dim3 dimGrid( 1, 1 );
hello<<<dimGrid, dimBlock>>>(ad, bd);
cudaMemcpy( a, ad, csize, cudaMemcpyDeviceToHost );
cudaFree( ad );
printf("%s\n", a);
return EXIT_SUCCESS;
}
cuda_sbatch.sh
:
#!/bin/bash
#-------------------------------------------------------------------------------
# SBATCH CONFIG
#-------------------------------------------------------------------------------
## resources
#SBATCH -p gpu3 # partition (which set of nodes to run on)
#SBATCH -N1 # nodes
#SBATCH -n20 # tasks (cores)
#SBATCH --mem=100G # total RAM
#SBATCH -t 0-01:00 # time (days-hours:minutes)
#SBATCH --qos=normal # qos level
#SBATCH --exclusive # reserve entire node
#
## labels and outputs
#SBATCH -J hello_cuda # job name - shows up in sacct and squeue
#SBATCH -o results_cuda-%j.out # filename for the output from this job (%j = job#)
#SBATCH -A general-gpu # investor account
#
## notifications
#SBATCH --mail-user=username@missouri.edu # email address for notifications
#SBATCH --mail-type=END,FAIL # which type of notifications to send
#
#-------------------------------------------------------------------------------
echo "### Starting at: $(date) ###"
# load modules then display what we have
module load cuda/cuda-7.5
module list
# Serial operations - only runs on the first core
echo "### Starting at: $(date)"
echo "First core reporting from node:"
hostname
echo "Currently working in directory:"
pwd
echo "Files in this folder:"
ls -l
# Execute the hello_cuda script:
./hello_cuda
echo "### Ending at: $(date) ###"