broadinstitute / Drop-seq

Java tools for analyzing Drop-seq data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Adaptor/PolyA Trimmer

v-mahughes opened this issue · comments

I run the TrimStartingSequence command with no errors, however the summary files look like this:

METRICS CLASS org.broadinstitute.dropseqrna.readtrimming.TrimStartingSequence$TrimMetric
mean stdev
? -0

Then, when I pass the outputs to the polyA trimmer, I get the following error:

13:40:16 [main] WARN com.intel.gkl.NativeLibraryLoader - Unable to load libgkl_compression.so from native/libgkl_compression.so (No such file or directory)
13:40:16 [main] WARN com.intel.gkl.NativeLibraryLoader - Unable to load libgkl_compression.so from native/libgkl_compression.so (No such file or directory)
[Fri Sep 16 13:40:16 PDT 2022] PolyATrimmer INPUT=/home/v-mahughes/ex_vivo/preprocess/dropseq_inputs/unaligned_tagged_trimmed_smart_bams/Four_S2_L001_unaligned_tagged_trimmed_smart.bam OUTPUT=/home/v-mahughes/ex_vivo/preprocess/dropseq_inputs/unaligned_tagged_polyA_trimmed_bams/Four_S2_L001_trimmed.bam OUTPUT_SUMMARY=/home/v-mahughes/ex_vivo/preprocess/dropseq_inputs/unaligned_tagged_polyA_trimmed_bams/summary_files/Four_S2_L001_polyA_trimming_report.txt USE_NEW_TRIMMER=true MISMATCHES=0 NUM_BASES=6 VALIDATION_STRINGENCY=SILENT TRIM_TAG=ZP ADAPTER=XMXC MAX_ADAPTER_ERROR_RATE=0.1 MIN_ADAPTER_MATCH=4 MIN_POLY_A_LENGTH=20 MIN_POLY_A_LENGTH_NO_ADAPTER_MATCH=6 DUBIOUS_ADAPTER_MATCH_LENGTH=6 MAX_POLY_A_ERROR_RATE=0.1 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Fri Sep 16 13:40:16 PDT 2022] Executing as NORTHAMERICA.v-mahughes@GCRSANDBOX309 on Linux 5.4.0-125-generic amd64; OpenJDK 64-Bit Server VM 11.0.16+8-post-Ubuntu-0ubuntu120.04; Deflater: Jdk; Inflater: Jdk; Provider GCS is not available; Picard version: 2.5.1(680c2ea_1642084299)
13:40:16 [main] WARN com.intel.gkl.compression.IntelInflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
13:40:16 [main] WARN com.intel.gkl.compression.IntelDeflaterFactory - Intel Deflater not supported, using Java.util.zip.Deflater
[Fri Sep 16 13:40:16 PDT 2022] org.broadinstitute.dropseqrna.readtrimming.PolyATrimmer done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=2058354688
Exception in thread "main" java.lang.NullPointerException
at htsjdk.samtools.util.SequenceUtil.reverseComplement(SequenceUtil.java:887)
at htsjdk.samtools.util.SequenceUtil.reverseComplement(SequenceUtil.java:123)
at org.broadinstitute.dropseqrna.readtrimming.AdapterDescriptor$TagAdapterElement.getSequence(AdapterDescriptor.java:74)
at org.broadinstitute.dropseqrna.readtrimming.AdapterDescriptor.getAdapterSequence(AdapterDescriptor.java:121)
at org.broadinstitute.dropseqrna.readtrimming.PolyAWithAdapterFinder.getPolyAStart(PolyAWithAdapterFinder.java:71)
at org.broadinstitute.dropseqrna.readtrimming.PolyATrimmer.doWork(PolyATrimmer.java:142)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)

What does this mean?

Here is the output of the samtools command:
aligned_tagged_trimmed_smart.bam |head -n 6
NB501935:1009:HLJLJBGXJ:1:11101:10000:10887 77 * 0 0 * * 0 0 NNNNNNNNNNNNNNNNNNNNN ##################### RG:Z:A
NB501935:1009:HLJLJBGXJ:1:11101:10000:10940 77 * 0 0 * * 0 0 NNNNNNNNNNNNNNNNNNNNN ##################### RG:Z:A
NB501935:1009:HLJLJBGXJ:1:11101:10000:12295 77 * 0 0 * * 0 0 NNNNNNNNNNNNNNNNNNNNN ##################### RG:Z:A
NB501935:1009:HLJLJBGXJ:1:11101:10000:14274 77 * 0 0 * * 0 0 NNNNNNNNNNNNNNNNNNNNN ##################### RG:Z:A
NB501935:1009:HLJLJBGXJ:1:11101:10000:15227 77 * 0 0 * * 0 0 NNNNNNNNNNNNNNNNNNNNN ##################### RG:Z:A
NB501935:1009:HLJLJBGXJ:1:11101:10000:5045 77 * 0 0 * * 0 0 NNNNNNNNNNNNNNNNNNNNN ##################### RG:Z:A

I appreciate your help. Let me know if you need anything else from any of the previous steps!

I now see that the issue may arise all the way back to when I am converting the bcl files to fastqs. The command I run to convert these files is : nohup /home/v-mahughes/bcl2fastq2-v2.20.0.x/bin/bcl2fastq --input-dir /home/v-mahughes/ex_vivo/preprocess/data/Sequencing_Run/Data/Intensities/BaseCalls/ --output-dir out --sample-sheet /home/v-mahughes/ex_vivo/preprocess/data/SampleSheet.csv

and the first 20 lines of one of my fastqs reads: @NB501935:1009:HLJLJBGXJ:1:11101:3713:1047 1:N:0:TACATCCG
NNNNNNNNNNNNNNNNNNNNN
+
#####################
@NB501935:1009:HLJLJBGXJ:1:11101:19202:1047 1:N:0:CCTGGCGA
NNNNNNNNNNNNNNNNNNNNN
+
#####################
@NB501935:1009:HLJLJBGXJ:1:11101:8478:1047 1:N:0:TTCATCAG
NNNNNNNNNNNNNNNNNNNNN
+
#####################
@NB501935:1009:HLJLJBGXJ:1:11101:11732:1047 1:N:0:TGCAGCCG
NNNNNNNNNNNNNNNNNNNNN
+
#####################
@NB501935:1009:HLJLJBGXJ:1:11101:9010:1048 1:N:0:TACATCCT
NNNNNNNNNNNNNNNNNNNNN
+
#####################

commented

Hi Madeline,

Your reads appear to be all Ns (no-calls). I'm not surprised polyA trimmer chokes on this input. Maybe a junk flowcell? I think you need to get some help with bcl2fastq, which we can't provide.

Regards, Alec