broadinstitute / Drop-seq

Java tools for analyzing Drop-seq data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

polyAtrimming NullPointerException

Justgitacc opened this issue · comments

Hi,

I've recently ran into this issue listed below while attempting to run polyAtrimmer per the cookbook flow through :
org.broadinstitute.dropseqrna.readtrimming.PolyATrimmer done. Elapsed time: 0.00 minutes.
Exception in thread "main" java.lang.NullPointerException

with file size generated 0kb. I've checked recent posts about such issue, and I've made sure that XC and XM tags are tagged and present :
SRR11862674.1 77 * 0 0 * * 0 0 GTCGGNTGAACCGGAGATCT AAAAA#EEEEEEEEEEEEEE RG:Z:A SRR11862674.10 77 * 0 0 * * 0 0 CATGTNGTGTCAGAGCCGAC AAAAA#EEEEEEEAEEEEEE RG:Z:A SRR11862674.100 77 * 0 0 * * 0 0 GTGTCGGCTTTGCCTGAGAA AAAAAEEEEEEEEEEEEEEE RG:Z:A SRR11862674.100 141 * 0 0 * * 0 0 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN ################################################################ XC:Z:GTGTCGGCTTTG RG:Z:A XM:Z:CCTGAGAA SRR11862674.1000 77 * 0 0 * * 0 0 ACAAAGGCTTGGGTTTGACT /AAAAEEEEEEEEEEEEAEERG:Z:A SRR11862674.100000 141 * 0 0 * * 0 0 GCCTTCCTTCTTCTCTATCGTATCCTCTGGCATCCATGAAACTACATTCAATTGGATGATGAAA /AAA666A//6//6///A///////////6/A/<A/</6A/6</A/6<///6<///66/6//// XC:Z:CAGGGCTCACAT RG:Z:A XM:Z:TTTTGAAC

Prior to this I've also mistakenly processed the dropseq sequenced file Read1 and Read2 separately prior(all the way from original splitted fasta via fastq-dump --splitt to DGE), and it seems to have reasonable counts. But I figured that was wrong after referring to the cookbook more closely, so now I've combined Read1(barcode read) and Read2(biological read) via FastqtoSam prior to the alignment pipeline.

So I have a few question in regard to this issue:

  1. Am I interpreting the procedure correctly ? dropseq sequences when downloading are 2 separate reads and should be combined via fastqtosam prior to following the alignment cookbook ?
    OR should the 2 reads from a single dropseq sequence be processed separately until STAR alignment and merged after ??

  2. If they should be combined via FastqtoSam and processed through the alignment pipeline together. What is causing the error in polyAtrimming ?? (I've also tried running polyAtrimming without USE_NEW_TRIMMER=true, which works but causes another error when converting SAMtoFastq in the subsequent step)

I apologize for the extended questions, and I greatly appreciate any information.

Hi James,

Thanks for the flow through explanation, and that is indeed how I processed my data so far using the commands below.

Combine
java -jar resources/picard.jar FastqToSam -F1 read1.fastq -F2 read2.fastq -O read.bam -SO queryname -SM MOE

Tagging
java -jar resources/dropseq.jar TagBamWithReadSequenceExtended INPUT= read.bam OUTPUT=read_Cell.bam SUMMARY=read_Cellular.bam_summary.txt BASE_RANGE=1-12 BASE_QUALITY=10 BARCODED_READ=1 DISCARD_READ=False TAG_NAME=XC NUM_BASES_BELOW_QUALITY=1 ;
java -jar resources/dropseq.jar TagBamWithReadSequenceExtended INPUT= read_Cell.bam OUTPUT=read_Cell_Molecular.bam SUMMARY=read_Molecular.bam_summary.txt BASE_RANGE=13-20 BASE_QUALITY=10 BARCODED_READ=1 DISCARD_READ=False TAG_NAME=XM NUM_BASES_BELOW_QUALITY=1 ;

Trimming
java -jar resources/dropseq.jar FilterBam TAG_REJECT=XQ INPUT=read_Cell_Molecular.bam OUTPUT=read_filtered.bam ;
java -jar resources/dropseq.jar TrimStartingSequence INPUT=read_filtered.bam OUTPUT=read_trimmed_smart.bam OUTPUT_SUMMARY=read_trimming_report.txt SEQUENCE=AAGCAGTGGTATCAACGCAGAGTGAATGGG MISMATCHES=0 NUM_BASES=5 ;

I've also tried per your suggestions. I extracted the header and first 10/100 lines of the bam file and ran polyA trimmer
samtools view -b read_trimmed_smart.bam | head -n 10 > test.bam
samtools view -b read_trimmed_smart.bam | head -n 100 > test100.bam

While the test.bam ran through polyATrim without an error, the test100.bam with the first 100 lines got the same nullpointer error.
And the .bam files are not supported by github to be attached.

Yes, I caught that error earlier and started the pipeline again. It worked all the way !
Thank you so much