Advanced Bioinformatics Services

Trimming and filtering NGS reads

The cutadapt software provides a convenient and reliable way to:

  • Remove adapters, poly-A tails and any other unwanted sequences from NGS reads
  • Remove a fixed number of bases from any side of NGS reads
  • Remove or trim reads with low-quality bases
  • Extract NGS reads carrying certain sequence motifs
  • Size select NGS reads
  • Change NGS reads names

# Download and unzip the latest version of cutadapt:

wget https://files.pythonhosted.org/packages/2f/02/f7550d8f1f53690d0812242800ed0f1f847945d07c0749f830a5f52c79ee/cutadapt-3.4.tar.gz

tar -xzf cutadapt-3.4.tar.gz

# Install the Python setuptools:

sudo apt-get install python-setuptools

# Install cutadapt:

cd cutadapt-3.4

sudo python setup.py install

Raw reads trimming commands for commmon NGS library prep kits:

CATS RNA and DNA library preparation kits (Diagenode):

# Read 1

cutadapt -u 3 input.fastq | cutadapt -a AAAAAAAA - | cutadapt -a AAAAAAAN$ -a AAAAAAN$ -a AAAAAN$ - | cutadapt -a AGAGCACACGTCTG - | cutadapt -O 8 -g GTTCAGAGTTCTACAGTCCGACGATCNNN - | cutadapt -m 18 -o output.fastq -

# Read 2

cutadapt -a CCCGATCGTCGG read2.fastq | cutadapt -a GGGGATCGTCGG - | cutadapt -m 18 -o output.fastq

NEBNext® Ultra™ and NEBNext® Ultra™ II DNA Library Prep Kits for Illumina® (NEB):

# Read 1

cutadapt -a GATCGGAAGAGCACACGT input.fastq | cutadapt -m 18 -o output.fastq -

# Read 2

cutadapt -a GATCGGAAGAGCACACGT input.fastq | cutadapt -m 18 -o output.fastq -

NEBNext® Small RNA Library Prep kit (NEB):

# Read 1

cutadapt -a AGATCGGAAGAGCACACGTCT input.fastq | cutadapt -m 18 -o output.fastq -

# Read 2

cutadapt -a GATCGTCGGACTGTAGAACTC input.fastq | cutadapt -m 18 -o output.fastq -

TruSeq Small RNA Library Preparation Kits (Illumina):

# Read 1

cutadapt -a TGGAATTCTCGGGTGCCAAGG input.fastq | cutadapt -m 18 -o output.fastq -

# Read 2

cutadapt -a GATCGTCGGACTGTAGAACTC input.fastq | cutadapt -m 18 -o output.fastq -

TruSeq RNA Library Prep Kit v2, TruSeq Stranded mRNA, TruSeq Stranded Total RNA (Illumina):

# Read 1

cutadapt -a AGATCGGAAGAGCACACGTCT input.fastq | cutadapt -m 18 -o output.fastq -

# Read 2

cutadapt -a AGATCGGAAGAGCGTCGTGTA input.fastq | cutadapt -m 18 -o output.fastq -

ScriptSeq RNA-Seq Library Preparation Kit (Illumina):

# Read 1

cutadapt -a AGATCGGAAGAGCACACGTCT input.fastq | cutadapt -m 18 -o output.fastq -

# Read 2

cutadapt -a AGATCGGAAGAGCGTCGTGTA input.fastq | cutadapt -m 18 -o output.fastq -