Advanced Bioinformatics Services

Converting GTF into BED

Both GENCODE and Ensembl databases provide GTF / GFF3 files with genome coordinates, but not the BED files. The latter, however, are required by some software (e.g. bedtools), and can be used to extract specific regions of the genome (e.g. promoters, TSS etc). You can generate BED files (from e.g. GTF file of the Ensembl v93 release) by executing the following command in Linux Shell:

# For genes

grep -P "\tgene\t" Homo_sapiens.GRCh38.93.gtf | cut -f1,4,5,7,9 | \

sed 's/[[:space:]]/\t/g' | sed 's/[;|"]//g' | \

awk -F $'\t' 'BEGIN { OFS=FS } { print $1,$2-1,$3,$6,".",$4,$10,$12,$14 }' | \

sort -k1,1 -k2,2n > Homo_sapiens.GRCh38.93.gene.bed

# For transcripts

grep -P "\ttranscript\t" Homo_sapiens.GRCh38.93.gtf | cut -f1,4,5,7,9 | \

sed 's/[[:space:]]/\t/g' | sed 's/[;|"]//g' | \

awk -F $'\t' 'BEGIN { OFS=FS } { print $1,$2-1,$3,$10,".",$4,$14,$16,$18 }' | \

sort -k1,1 -k2,2n > Homo_sapiens.GRCh38.93.transcript.bed