Extracting specific sequences from FASTA files

Extracting specific sequences from FASTA files

The faFilter software also offers a reliable way to extract any specific sequences from a FASTA reference file based on the information in the header (sequence ID). For instance, using the faFilter one can generate separate FASTA reference files for particular RNA types (rRNA, tRNA, snRNA, snoRNA, miRNA etc.) from a FASTA file containing total reference transcriptome. Finally, the faFilter can be used to remove sequences having unwanted IDs from any FASTA file.

# Download faFilter software:

wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/faFilter

# Create a link in your $PATH (e.g. /usr/local/bin):

sudo ln -s /path/to/faFilter/faFilter /usr/local/bin/faFilter

# The following command will extract sequences having only IDs starting with "hg38":

faFilter -name=hg38* original_fasta.fa fasta_containing_only_sequences_having_ID_started_with_"hg38".fa

# The following command will extract only the sequences with IDs in a custom list:

faFilter -namePatList=ID_list.txt original_fasta.fa fasta_containing_only_sequences_with_IDs_from_ID_list.fa

SciBerg e.Kfm

Legal form: Sole Proprietorship

Birkenauer Str. 7, Mannheim 68309, Germany

Amtsgericht Mannheim HRA 707401

VAT identification number: DE 312303132

Get in Touch

Email: info@sciberg.com

Phone: +49 171 190 8276