griffithlab / rnaseq_tutorial

Informatics for RNA-seq: A web resource for analysis on the cloud. Educational tutorials and working pipelines for RNA-seq analysis including an introduction to: cloud computing, critical file formats, reference genomes, gene annotation, expression, differential expression, alternative splicing, data visualization, and interpretation.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Differential expression on STAR throwing an error

pfaucon opened this issue · comments

I've been following the guide for STAR from the start, but I seem to have an issue at the cuffmerge stage:

cd $RNA_HOME/expression/star_cufflinks/ref_only/
ls -1 _Rep_ERCC*/transcripts.gtf > assembly_GTF_list.txt
cuffmerge -p 8 -o merged -g $RNA_HOME/refs/hg19/genes/genes_chr22_ERCC92.gtf -s $RNA_HOME/refs/hg19/bwt/chr22_ERCC92/ assembly_GTF_list.txt

I get the error:

[Sat Jan 2 15:23:16 2016] Beginning transcriptome assembly merge

[Sat Jan 2 15:23:16 2016] Preparing output location merged/
[Sat Jan 2 15:23:17 2016] Converting GTF files to SAM
[15:23:17] Loading reference annotation.
Error: duplicate GFF ID 'ENST00000400518' encountered!
[FAILED]
Error: could not execute gtf_to_sam

if I dump the contents of the .gtf file it looks like there are multiple hits for that ID:

grep 'ENST00000400518' $RNA_HOME/refs/hg19/genes/genes_chr22_ERCC92.gtf > hits.txt
hits.txt

This is the second time I have tried to replicate it, so I don't have any of the tophat alignments in this version, but I don't think they should be needed. Do you have any suggestions?

Are you trying this on the demonstration data (and all other example files) provided along with the tutorial? It worked in November 2015, the last time we ran through the course. Are you using tool versions as indicated in the tutorials as well? It is possible that a more recent version introduced an issue?

That seems to have been the problem, I wasn't paying enough attention and the samtools package in the Ubuntu repositories is 0.1.19 instead of 1.19, I reran it from scratch with minimal problems. I'm not sure what the output should be (I got a few errors at the cuffmerge phase), but the output does appear reasonable.

I made a few changes where spaces were used instead of tabs for the code blocks that should be solid.

I also made a change to the Differential Expression page (I tried to make a pull request but I failed and instead edited the actual page). The change is:

cuffmerge -p 8 -o merged -g $RNA_HOME/refs/hg19/genes/genes_chr22_ERCC92.gtf -s $RNA_HOME/refs/hg19/bwt/chr22_ERCC92/ assembly_GTF_list.txt

cuffmerge -p 8 -o merged -g $RNA_HOME/refs/hg19/genes/genes_chr22_ERCC92.gtf -s $RNA_HOME/refs/hg19/fasta/chr22_ERCC92/ assembly_GTF_list.txt

If you don't do the tophat alignment the bwt directory doesn't exist, but the .fasta files are all in the fasta directory.

Also, thanks a ton for creating this document, I think it is a huge asset for people getting started in RNA-Seq analysis.

Glad you got it working and found it useful. And, thanks for the edits!