broadinstitute / Drop-seq

Java tools for analyzing Drop-seq data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Exception in thread "main" org.broadinstitute.dropseqrna.TranscriptomeException: Base [13] was requested, but the read isn't long enough [GGAC]

CherryX727 opened this issue · comments

Hi, I'm trying to run TagBamWithReadSequenceExtended step but I get the following error.And the output file size is 0.I don't know how to solve it.

org.broadinstitute.dropseqrna.utils.TagBamWithReadSequenceExtended done. Elapsed time: 33.82 minutes.
Runtime.totalMemory()=3821535232
Exception in thread "main" org.broadinstitute.dropseqrna.TranscriptomeException: Base [13] was requested, but the read isn't long enough [GGAC]
        at org.broadinstitute.dropseqrna.utils.BaseQualityFilter.scoreBaseQuality(BaseQualityFilter.java:45)
        at org.broadinstitute.dropseqrna.utils.TagBamWithReadSequenceExtended.processSingleRead(TagBamWithReadSequenceExtended.java:164)
        at org.broadinstitute.dropseqrna.utils.TagBamWithReadSequenceExtended.doWork(TagBamWithReadSequenceExtended.java:132)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
        at org.broadinstitute.dropseqrna.cmdline.DropSeqMain.main(DropSeqMain.java:42)

And the following is the code I used

./TagBamWithReadSequenceExtended -INPUT /home/data/t050205/software/DropSeq_tools/test_data/GSM1544798_SpeciesMix_ThousandSTAMPs.bam \
-OUTPUT /home/data/t050205/software/DropSeq_tools/test_data/GSM1544798_SpeciesMix_ThousandSTAMPs_tagged_CellMolecular.bam \
-SUMMARY /home/data/t050205/software/DropSeq_tools/test_data/GSM1544798_SpeciesMix_ThousandSTAMPs_tagged_Molecular.bam_summary.txt \
-TMP_DIR /home/data/t050205/software/DropSeq_tools/test_data/ \
-BASE_RANGE 13-20 \
-BASE_QUALITY 10 \
-BARCODED_READ 1 \
-DISCARD_READ True \
-TAG_NAME XM \
-NUM_BASES_BELOW_QUALITY 1

Thanks for your answer.

commented

Hi @CherryX727 ,

Every read 1 in your input BAM needs to be at least 20 bases long. It appears that for at least one read pair, read 1 is only 4 bases long: GGAC

I'm guessing there is a problem with the process that produced the input to this program. I think you need to investigate that process. Note that you can use samtools view to examine the input BAM.

Regards, Alec

Hi @alecw
Thank you for your answer.You are right. I use samtools to view the input BAM and there are some reads only 4 bases long.What should I do with these reads?

The BAM I used is the supplementary file of GSM1544798 SpeciesMix_ThousandSTAMPs_50cellspermicroliter.I checked the size of the downloaded file and it matched the data on the website.Is there a problem with the data download process?If not, how can I make sure that the BAM entered is compliant?

commented

Hi @CherryX727 ,

This BAM has already been processed by the Drop-seq tools. The cellular and molecular barcodes have already been extracted from read 1 and applied as tags, and the reads are aligned. It doesn't make sense to run TagBamWithReadSequenceExtended on this BAM.

I looked at the first read in the BAM:

% samtools view GSM1544798_SpeciesMix_ThousandSTAMPs.bam | head -1
NS500217:67:H14GMBGXX:3:22409:13341:7514	0	HUMAN_1	14283	0	42M8S	*	0	0	GGTCAGCTGGGAGCTTCTGCCCCCACTGCCTAAAAACCAACAAAAACAAA	A<AAAAAAF))AAA<FF<FA)7<7AA..AF7...)FF.<FAAA..)A7.<	XC:Z:AGGCAATAGAAC	XF:Z:INTERGENIC	PG:Z:STAR	RG:Z:A	NH:i:6	NM:i:3	XM:Z:GATGCCTT	UQ:i:34	AS:i:35

As you can see, cellular and molecular indices are in XC and XM tags, there is alignment information on the read, and it is no longer a paired read.

Closed as there was no further response.