Trimming and filtering of NGS reads

Trimming and filtering NGS reads

Currently, the cutadapt software provides the most convenient way to:

  • Remove adapters, poly-A tails and any other unwanted sequences from NGS reads
  • Remove a fixed number of bases from any side of NGS reads
  • Remove or trim reads with low-quality bases
  • Extract NGS reads carrying certain sequence motifs
  • Size select NGS reads
  • Change NGS reads names


# Download and unzip the latest version of cutadapt:

wget https://files.pythonhosted.org/packages/2f/02/f7550d8f1f53690d0812242800ed0f1f847945d07c0749f830a5f52c79ee/cutadapt-3.4.tar.gz

tar -xzf cutadapt-3.4.tar.gz


# Install the Python setuptools:

sudo apt-get install python-setuptools

# Install cutadapt:

cd cutadapt-3.4

sudo python setup.py install

Raw reads trimming for different NGS kits:

CATS RNA and DNA library preparation kits (Diagenode)

# Read 1

cutadapt -u 3 input.fastq | cutadapt -a AAAAAAAA - | cutadapt -a AAAAAAAN$ -a AAAAAAN$ -a AAAAAN$ - | cutadapt -a AGAGCACACGTCTG - | cutadapt -O 8 -g GTTCAGAGTTCTACAGTCCGACGATCNNN - | cutadapt -m 18 -o output.fastq -

# Read 2

cutadapt -a CCCGATCGTCGG read2.fastq | cutadapt -a GGGGATCGTCGG - | cutadapt -m 18 -o output.fastq

NEBNext® Ultra™ and NEBNext® Ultra™ II DNA Library Prep Kits for Illumina® (NEB)

# Read 1

cutadapt -a GATCGGAAGAGCACACGT input.fastq | cutadapt -m 18 -o output.fastq -

# Read 2

cutadapt -a GATCGGAAGAGCACACGT input.fastq | cutadapt -m 18 -o output.fastq -

NEBNext® Small RNA Library Prep kit (NEB)

# Read 1

cutadapt -a AGATCGGAAGAGCACACGTCT input.fastq | cutadapt -m 18 -o output.fastq -

# Read 2

cutadapt -a GATCGTCGGACTGTAGAACTC input.fastq | cutadapt -m 18 -o output.fastq -

TruSeq Small RNA Library Preparation Kits (Illumina)

# Read 1

cutadapt -a TGGAATTCTCGGGTGCCAAGG input.fastq | cutadapt -m 18 -o output.fastq -

# Read 2

cutadapt -a GATCGTCGGACTGTAGAACTC input.fastq | cutadapt -m 18 -o output.fastq -

TruSeq RNA Library Prep Kit v2, TruSeq Stranded mRNA, TruSeq Stranded Total RNA (Illumina)

# Read 1

cutadapt -a AGATCGGAAGAGCACACGTCT input.fastq | cutadapt -m 18 -o output.fastq -

# Read 2

cutadapt -a AGATCGGAAGAGCGTCGTGTA input.fastq | cutadapt -m 18 -o output.fastq -

ScriptSeq RNA-Seq Library Preparation Kit (Illumina)

# Read 1

cutadapt -a AGATCGGAAGAGCACACGTCT input.fastq | cutadapt -m 18 -o output.fastq -

# Read 2

cutadapt -a AGATCGGAAGAGCGTCGTGTA input.fastq | cutadapt -m 18 -o output.fastq -