Converting GTF files into BED files

RESOURCES > USEFUL SCRIPTS

Converting GTF file into BED file

Both GENCODE and Ensembl databases provide GTF / GFF3 files with genome coordinates, but not the BED files. The latter, however, are required by some software (e.g. bedtools), and can be used to extract specific regions of the genome (e.g. promoters, TSS etc). You can generate BED files (from e.g. GTF file of the Ensembl v93 release) by executing the following command in Linux Shell:

# For genes

grep -P "\tgene\t" Homo_sapiens.GRCh38.93.gtf | cut -f1,4,5,7,9 | \

sed 's/[[:space:]]/\t/g' | sed 's/[;|"]//g' | \

awk -F $'\t' 'BEGIN { OFS=FS } { print $1,$2-1,$3,$6,".",$4,$10,$12,$14 }' | \

sort -k1,1 -k2,2n > Homo_sapiens.GRCh38.93.gene.bed

# For transcripts

grep -P "\ttranscript\t" Homo_sapiens.GRCh38.93.gtf | cut -f1,4,5,7,9 | \

sed 's/[[:space:]]/\t/g' | sed 's/[;|"]//g' | \

awk -F $'\t' 'BEGIN { OFS=FS } { print $1,$2-1,$3,$10,".",$4,$14,$16,$18 }' | \

sort -k1,1 -k2,2n > Homo_sapiens.GRCh38.93.transcript.bed