snakemake-workflows / rna-seq-star-deseq2

RNA-seq workflow using STAR and DESeq2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

annotation.bed file size zero

jy-nam opened this issue · comments

After added QC process, I realized that annotation.bed (gtf2bed module) newly included.
However, my pipeline got error in QC step.

I notice that annotation.bed file is zero size, but annotation.db (temporal file) was not.

Which I use GTF file? My GTF file downloaded from iGenome illumina (Rat).

Thank you.

JY

@sschmeier seems like there might be a problem with the gtf2bed script.
@jy-nam can you give us the URL to the particular GTF used?

Hi,

I solve the this issue. It was success when I change my GTF file NCBI to Ensembl.

below is my two GTF files.

[[NCBI]]

[jynam@hipm01 Rattus_norvegicus]$ head NCBI/Rnor_6.0/Annotation/Genes/genes.gtf

1 Gnomon exon 56471 56705 . - . gene_id "LOC103690911"; gene_name "LOC103690911"; transcript_id "rna0"; tss_id "TSS27591";

[[Ensembl]]

[jynam@hipm01 Rattus_norvegicus]$ head Ensembl/Rnor_6.0/Annotation/Genes/genes.gtf

1 ensembl CDS 396700 396905 . + 0 exon_number "1"; gene_biotype "protein_coding"; gene_id "ENSRNOG00000046319"; gene_name "Vom2r3"; gene_source "ensembl"; gene_version "3"; p_id "P17198"; protein_id "ENSRNOP00000065675"; protein_version "1"; transcript_biotype "protein_coding"; transcript_id "ENSRNOT00000072186"; transcript_name "Vom2r3-202"; transcript_source "ensembl"; transcript_version "1"; tss_id "TSS21219";

Thanks,
JY

Switching to a better annotation GTF works, or if you are stuck using GTF annotations such as the ones that come with Illumina iGenomes, you can fix this by modifying the gtf2bed.py script to have disable_infer_genes=False and disable_infer_transcripts=False.

As this appears to have worked with Ensembl GTF files, this commit introducing a standard Ensembl GTF download should have fixed this. If this reoccurs, please feel free to reopen.