Advanced Bioinformatics Services

Extracting raw read counts

With eXpress

# Download and unzip the latest binary files for eXpress software:

wget https://pachterlab.github.io/eXpress/downloads/express-1.5.1/express-1.5.1-linux_x86_64.tgz

tar -xvzf express-1.5.1-linux_x86_64.tgz

Copy binaries to your $PATH (e.g. /usr/local/bin):

cd express-1.5.1-linux_x86_64

sudo cp express /usr/local/bin/

# Calculate raw reads count using BAM file and a corresponding reference transcriptome file:

## For read1 of a stranded library (accept only forward single-end alignments):

express --f-stranded --no-bias-correct reference.fa INPUT_over_reference.bam

mv results.xprs INPUT_over_reference.xprs

## For read2 of a stranded library (accept only reverse single-end alignments):

express --r-stranded --no-bias-correct reference.fa INPUT.bam

mv results.xprs INPUT_over_reference.xprs

Important notes:

- eXpress works equally well on sorted BAM files generated by most aligners

- eXpress will not work on sorted BAM files generated by bowtie or bowtie2 aligner if the -k option was used for mapping

# Combine separate XPRS files into counts matrices:

sort -k2,2 INPUT1_over_reference.xprs > INPUT1_over_reference_sorted.xprs

sort -k2,2 INPUT2_over_reference.xprs > INPUT2_over_reference_sorted.xprs

sort -k2,2 INPUT3_over_reference.xprs > INPUT3_over_reference_sorted.xprs

sort -k2,2 INPUT4_over_reference.xprs > INPUT4_over_reference_sorted.xprs

sort -k2,2 INPUTN_over_reference.xprs > INPUTN_over_reference_sorted.xprs

cut -f2,3,5 INPUT1_over_reference_sorted.xprs > INPUT1_over_reference_sorted_ss.xprs

cut -f2,3,5 INPUT2_over_reference_sorted.xprs > INPUT2_over_reference_sorted_ss.xprs

cut -f2,3,5 INPUT3_over_reference_sorted.xprs > INPUT3_over_reference_sorted_ss.xprs

cut -f2,3,5 INPUT4_over_reference_sorted.xprs > INPUT4_over_reference_sorted_ss.xprs

cut -f2,3,5 INPUTN_over_reference_sorted.xprs > INPUTN_over_reference_sorted_ss.xprs

paste INPUT1_over_reference_sorted_ss.xprs INPUT2_over_reference_sorted_ss.xprs INPUT3_over_reference_sorted_ss.xprs INPUT4_over_reference_sorted_ss.xprs INPUTN_over_reference_sorted_ss.xprs > INPUT1234N_over_reference_temp.xprs

cut -f1,2,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,48,51,N+3 INPUT1234N_over_reference_temp.xprs > INPUT1234N_over_reference.xprs

With RSEM

# Download the latest source files for RSEM software, unzip, compile and install:

wget https://github.com/deweylab/RSEM/archive/v1.3.1.tar.gz

gunzip RSEM-1.3.1.tar.gz

cd RSEM-1.3.1

make

sudo make install #RSEM executables will be installed to /usr/local/bin

# Preparing transcriptome reference sequences

## For transcriptome reference

rsem-prepare-reference reference.fa reference.fa

## For genome reference

rsem-prepare-reference --gtf path_to_gtf_file/genome_reference.gtf path_to_fasta_file/genome_reference.fa genome_reference.fa

# Perform reads counting

## For strand-specific reads mapped to transcriptome reference

rsem-calculate-expression --bam --no-bam-output -p [insert number of threads] --strand-specific INPUT_over_reference.bam path_to_RSEM_refs/RSEM_refs

## For strand-specific reads mapped to genome reference

rsem-calculate-expression --bam --no-bam-output -p [insert number of threads] --strand-specific INPUT_over_reference.bam path_to_RSEM_refs/RSEM_refs outputfilename