GatherGeneGCLength breaks on transcript writing

Question

GatherGeneGCLength breaks on transcript writing

jamesnemesh opened this issue 3 years ago · comments

Instructions

Affected tool(s)

GatherGeneGCLength

At Broad:
Exception in thread "main" java.lang.IllegalStateException: the input sequence name 'S100A1' has already been added at htsjdk.samtools.reference.FastaReferenceWriter.startSequence(FastaReferenceWriter.java:408) at htsjdk.samtools.reference.FastaReferenceWriter.startSequence(FastaReferenceWriter.java:364) at org.broadinstitute.dropseqrna.utils.FastaSequenceFileWriter.writeSequence(FastaSequenceFileWriter.java:68) at org.broadinstitute.dropseqrna.annotation.GatherGeneGCLength.writeTranscriptSequence(GatherGeneGCLength.java:219) at org.broadinstitute.dropseqrna.annotation.GatherGeneGCLength.doWork(GatherGeneGCLength.java:138) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103) at org.broadinstitute.dropseqrna.cmdline.DropSeqMain.main(DropSeqMain.java:42)

Affected version(s)

Latest development/master branch as of [5/9/2022]

Description

When GatherGeneGCLength to emit transcript sequences, program fails.

Steps to reproduce

Run on our human metadata:

/broad/mccarroll/software/dropseq/priv/GatherGeneGCLength ANNOTATIONS_FILE=/broad/mccarroll/software/metadata/individual_reference/GRCh38.89/GRCh38.gtf REFERENCE_SEQUENCE=/broad/mccarroll/software/metadata/individual_reference/GRCh38.89/GRCh38.fasta.gz O=GRCh38.gc.txt OUTPUT_TRANSCRIPT_LEVEL=true OUTPUT_TRANSCRIPT_SEQUENCES=GRCh38_maskedAlt.transcript_sequences.txt VALIDATION_STRINGENCY=LENIENT

Note: this is due to the functionality required by OUTPUT_TRANSCRIPT_SEQUENCES. If that argument is removed this program runs successfully.

Expected behavior

The transcript file should be written.

Actual behavior

Stack trace.