Converting GTF files into BED files

RESOURCES > USEFUL SCRIPTS

Converting GTF file into BED file

Both GENCODE and Ensembl databases provide GTF / GFF3 files with genome coordinates, but not the BED files. The latter, however, are required by some software (e.g. bedtools), and can be used to extract specific regions of the genome (e.g. promoters, TSS etc). You can generate BED files (from e.g. GTF file of the Ensembl v93 release) by executing the following command in Linux Shell:

# For genes

grep -P "\tgene\t" Homo_sapiens.GRCh38.93.gtf | cut -f1,4,5,7,9 | \

sed 's/[[:space:]]/\t/g' | sed 's/[;|"]//g' | \

awk -F $'\t' 'BEGIN { OFS=FS } { print $1,$2-1,$3,$6,".",$4,$10,$12,$14 }' | \

sort -k1,1 -k2,2n > Homo_sapiens.GRCh38.93.gene.bed

# For transcripts

grep -P "\ttranscript\t" Homo_sapiens.GRCh38.93.gtf | cut -f1,4,5,7,9 | \

sed 's/[[:space:]]/\t/g' | sed 's/[;|"]//g' | \

awk -F $'\t' 'BEGIN { OFS=FS } { print $1,$2-1,$3,$10,".",$4,$14,$16,$18 }' | \

sort -k1,1 -k2,2n > Homo_sapiens.GRCh38.93.transcript.bed

SciBerg e.Kfm

Legal form: Sole Proprietorship

Birkenauer Str. 7, Mannheim 68309, Germany

Amtsgericht Mannheim HRA 707401

VAT identification number: DE 312303132

Get in Touch

Email: info@sciberg.com

Phone: +49 171 190 8276