Cutadapt

Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.

Software URL: https://github.com/marcelm/cutadapt/
Documentation: http://cutadapt.readthedocs.io/en/stable/

Single-End Trimming

In this example we will submit an sbatch job file to SLURM that will run a cutadapt script on a single node, using 28 cores and up to 28G of memory. The example below is for single-end data. To try the example replace Example_R1.fastq.gz with the name of a single-end FASTQ file.

sbatch cutadapt_single_sbatch.sh

Contents of Example Files

cutadapt_single_sbatch.sh:


#!/bin/bash
#-------------------------------------------------------------------------------
#  SBATCH CONFIG
#-------------------------------------------------------------------------------
#!/bin/env bash
#SBATCH -J cutadapt_job             # job name
#SBATCH -o cutadapt_job.o_%j        # standard out
#SBATCH --partition BioCompute
#SBATCH --mem 28G
#SBATCH --ntasks 1
#SBATCH --nodes 1
#SBATCH --cpus-per-task 28          # 28 cores seemed to me to provide a good balance of speed versus core usage. A paired-end
#                                   # set of compressed FASTQ files (~11GB each) finished in less than an hour with 28 cores.
#SBATCH --time 04:00:00             # 4 hour time limit.
#-------------------------------------------------------------------------------

## Load the cutadapt module with its dependencies
module load ircf/ircf-modules
module load pigz/pigz-2.3.4
module load cutadapt/cutadapt-1.15
module list

## Run cutadapt with the following options
## --cores=SLURM_CPUS_PER_TASK    Use the same number of CPU cores as you selected above (28)
## --trim-n          trim away 5' and 3' N's
## --minimum-length  discard read if shorter than specified
## -a G{100}         trim poly G (literally an adapter made up of 100 G's) from 3' end (this is needed for NextSeq or Novaseq data)
## -a AGATCGGAA...   trim 3' for Illumina adapter

cutadapt -a G{100} -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA  --cores=${SLURM_CPUS_PER_TASK} --trim-n  --minimum-length=10 --output=Example_R1.trimmed.fastq.gz Example_R1.fastq.gz

Paired-End Trimming

In this example we will submit an sbatch job file to Slurm that will run a cutadapt script on a single node, using 28 cores and up to 28G of memory. The example below is for paired-end data. Replace Example_R1.fastq.gz and Example_R2.fastq.gz with the names of paired-end FASTQ files to run your data.

sbatch cutadapt_paired_sbatch.sh

Contents of Example File

cutadapt_paired_sbatch.sh:

#!/bin/bash
#-------------------------------------------------------------------------------
#  SBATCH CONFIG
#-------------------------------------------------------------------------------
#!/bin/env bash
#SBATCH -J cutadapt_job             # job name
#SBATCH -o cutadapt_job.o_%j        # standard out
#SBATCH --partition BioCompute
#SBATCH --mem 28G
#SBATCH --ntasks 1
#SBATCH --nodes 1
#SBATCH --cpus-per-task 28          # 28 cores seemed to me to provide a good balance of speed versus core usage. A paired-end
                                    # set of compressed FASTQ files (~11GB each) finished in less than an hour with 28 cores.
#SBATCH --time 04:00:00             # 4 hour time limit.

## Load the cutadapt module with its dependencies
module load ircf/ircf-modules
module load pigz/pigz-2.3.4
module load cutadapt/cutadapt-1.15
module list

## Run cutadapt non-interactively
## --cores=SLURM_CPUS_PER_TASK    Use the same number of CPU cores as you selected above (28)
## --trim-n          trim away 5' and 3' N's
## --minimum-length  discard pair if one read is shorter than specified
## -a G{100}         trim poly G (literally an adapter made up of 100 G's) from 3' end of forward read (for NextSeq/Novaseq)
## -A G{100}         trim poly G from 3' end of reverse read (necessary for NextSeq/Novaseq because of artifactual G's)
## -a AGATCGGAA....  trim 3' end of forward read for Illumina adapter
## -A AGATCGGAA....  trim 3' end of reverse read for Illumina adapter

cutadapt -a G{100} -A G{100} -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT --cores=${SLURM_CPUS_PER_TASK} --trim-n --minimum-length=10 --output=Example_R1.trimmed.fastq.gz --paired-output=Example_R2.trimmed.fastq.gz Example_R1.fastq.gz Example_R2.fastq.gz

Cutadapt

Single-End Trimming

Contents of Example Files

Paired-End Trimming

Contents of Example File

Example fastq Files

Example `fastq` Files