GatherGeneGCLength breaks on transcript writing
jamesnemesh opened this issue · comments
Instructions
Affected tool(s)
GatherGeneGCLength
At Broad:
Exception in thread "main" java.lang.IllegalStateException: the input sequence name 'S100A1' has already been added at htsjdk.samtools.reference.FastaReferenceWriter.startSequence(FastaReferenceWriter.java:408) at htsjdk.samtools.reference.FastaReferenceWriter.startSequence(FastaReferenceWriter.java:364) at org.broadinstitute.dropseqrna.utils.FastaSequenceFileWriter.writeSequence(FastaSequenceFileWriter.java:68) at org.broadinstitute.dropseqrna.annotation.GatherGeneGCLength.writeTranscriptSequence(GatherGeneGCLength.java:219) at org.broadinstitute.dropseqrna.annotation.GatherGeneGCLength.doWork(GatherGeneGCLength.java:138) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103) at org.broadinstitute.dropseqrna.cmdline.DropSeqMain.main(DropSeqMain.java:42)
Affected version(s)
- Latest development/master branch as of [5/9/2022]
Description
When GatherGeneGCLength to emit transcript sequences, program fails.
Steps to reproduce
Run on our human metadata:
/broad/mccarroll/software/dropseq/priv/GatherGeneGCLength ANNOTATIONS_FILE=/broad/mccarroll/software/metadata/individual_reference/GRCh38.89/GRCh38.gtf REFERENCE_SEQUENCE=/broad/mccarroll/software/metadata/individual_reference/GRCh38.89/GRCh38.fasta.gz O=GRCh38.gc.txt OUTPUT_TRANSCRIPT_LEVEL=true OUTPUT_TRANSCRIPT_SEQUENCES=GRCh38_maskedAlt.transcript_sequences.txt VALIDATION_STRINGENCY=LENIENT
Note: this is due to the functionality required by OUTPUT_TRANSCRIPT_SEQUENCES. If that argument is removed this program runs successfully.
Expected behavior
The transcript file should be written.
Actual behavior
Stack trace.