Cutadapt
Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.
- Software URL:
https://github.com/marcelm/cutadapt/
- Documentation:
http://cutadapt.readthedocs.io/en/stable/
Single-End Trimming
In this example we will submit an sbatch
job file to SLURM that will run a cutadapt
script on a single node, using 28 cores and up to 28G of memory. The example below
is for single-end data. To try the example replace Example_R1.fastq.gz
with the
name of a single-end FASTQ file.
sbatch cutadapt_single_sbatch.sh
Contents of Example Files
cutadapt_single_sbatch.sh
:
#!/bin/bash
#-------------------------------------------------------------------------------
# SBATCH CONFIG
#-------------------------------------------------------------------------------
#!/bin/env bash
#SBATCH -J cutadapt_job # job name
#SBATCH -o cutadapt_job.o_%j # standard out
#SBATCH --partition BioCompute
#SBATCH --mem 28G
#SBATCH --ntasks 1
#SBATCH --nodes 1
#SBATCH --cpus-per-task 28 # 28 cores seemed to me to provide a good balance of speed versus core usage. A paired-end
# # set of compressed FASTQ files (~11GB each) finished in less than an hour with 28 cores.
#SBATCH --time 04:00:00 # 4 hour time limit.
#-------------------------------------------------------------------------------
## Load the cutadapt module with its dependencies
module load ircf/ircf-modules
module load pigz/pigz-2.3.4
module load cutadapt/cutadapt-1.15
module list
## Run cutadapt with the following options
## --cores=SLURM_CPUS_PER_TASK Use the same number of CPU cores as you selected above (28)
## --trim-n trim away 5' and 3' N's
## --minimum-length discard read if shorter than specified
## -a G{100} trim poly G (literally an adapter made up of 100 G's) from 3' end (this is needed for NextSeq or Novaseq data)
## -a AGATCGGAA... trim 3' for Illumina adapter
cutadapt -a G{100} -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA --cores=${SLURM_CPUS_PER_TASK} --trim-n --minimum-length=10 --output=Example_R1.trimmed.fastq.gz Example_R1.fastq.gz
Paired-End Trimming
In this example we will submit an sbatch
job file to Slurm that will run a cutadapt
script on a single node, using 28 cores and up to 28G of memory. The example below
is for paired-end data. Replace Example_R1.fastq.gz
and Example_R2.fastq.gz
with the names of paired-end FASTQ files to run your data.
sbatch cutadapt_paired_sbatch.sh
Contents of Example File
cutadapt_paired_sbatch.sh
:
#!/bin/bash
#-------------------------------------------------------------------------------
# SBATCH CONFIG
#-------------------------------------------------------------------------------
#!/bin/env bash
#SBATCH -J cutadapt_job # job name
#SBATCH -o cutadapt_job.o_%j # standard out
#SBATCH --partition BioCompute
#SBATCH --mem 28G
#SBATCH --ntasks 1
#SBATCH --nodes 1
#SBATCH --cpus-per-task 28 # 28 cores seemed to me to provide a good balance of speed versus core usage. A paired-end
# set of compressed FASTQ files (~11GB each) finished in less than an hour with 28 cores.
#SBATCH --time 04:00:00 # 4 hour time limit.
## Load the cutadapt module with its dependencies
module load ircf/ircf-modules
module load pigz/pigz-2.3.4
module load cutadapt/cutadapt-1.15
module list
## Run cutadapt non-interactively
## --cores=SLURM_CPUS_PER_TASK Use the same number of CPU cores as you selected above (28)
## --trim-n trim away 5' and 3' N's
## --minimum-length discard pair if one read is shorter than specified
## -a G{100} trim poly G (literally an adapter made up of 100 G's) from 3' end of forward read (for NextSeq/Novaseq)
## -A G{100} trim poly G from 3' end of reverse read (necessary for NextSeq/Novaseq because of artifactual G's)
## -a AGATCGGAA.... trim 3' end of forward read for Illumina adapter
## -A AGATCGGAA.... trim 3' end of reverse read for Illumina adapter
cutadapt -a G{100} -A G{100} -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT --cores=${SLURM_CPUS_PER_TASK} --trim-n --minimum-length=10 --output=Example_R1.trimmed.fastq.gz --paired-output=Example_R2.trimmed.fastq.gz Example_R1.fastq.gz Example_R2.fastq.gz