QC and analysis of reads in FASTQ files

QC and analysis of FASTQ files

Currently, the FastQC software (Babraham Bioinformatics) provides the most convenient way for:

  • Preliminary quality check of raw sequence data obtained from Illumina and Ion Torrent platforms
  • Calculating a percent of duplicated reads (duplication rate)
  • Visualizing the distribution of reads length and nucleotide content
  • Calculating and visualizing GC distribution
  • Extracting overrepresented sequences from any fastq file

# Download, unzip the latest release of FastQC:

wget https://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.9.zip

unzip fastqc_v0.11.9.zip

# Make the fastqc running script executable:

cd FastQC

sudo chmod 755 fastqc

# Create a link in your $PATH (e.g. /usr/local/bin):

sudo ln -s /path/to/FastQC/fastqc /usr/local/bin/fastqc

# Launch interactive interface from any location:


# Or, apply to one or multiple fastq files from any directory:

fastqc file1.fastq file2.fastq .... fileN.fastq