Many FASTA references files (e.g. downloaded from UCSC & NCBI ftp servers) contain duplicated sequences. The latter would not only decrease the number of uniquely mapped reads but may also interfere in the downstream processing by other software packages (e.g. reads quantification with eXpress).
Currently, the faFilter software offers the most reliable way to clean any FASTA file from duplicated reference sequences.
# Download faFilter software:
# Create a link in your $PATH (e.g. /usr/local/bin):
sudo ln -s /path/to/faFilter/faFilter /usr/local/bin/faFilter
# Apply to a FASTA reference file:
faFilter -uniq reference.fa reference_no_duplicates.fa
Legal form: Sole Proprietorship
Birkenauer Str. 7, Mannheim 68309, Germany
Amtsgericht Mannheim HRA 707401
VAT identification number: DE 312303132
Phone: +49 171 190 8276