broadinstitute / Drop-seq

Java tools for analyzing Drop-seq data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DigitalExpression v 2.0.0

maggieMCKO opened this issue · comments

Thank you for developing the Drop-seq tool.

We ran a mixed cell experiment and followed the instruction of the Drop-seq_Alignment_Cookbook (version 2.0.0).

The Drop-seq_Alignment pipeline seems to work for us, but we had issues when we were trying to run DigitalExpression using the following command:
Drop-seq_tools-2.0.0/DigitalExpression I=final_hs.bam O=final_hs_dge2.txt.gz SUMMARY=final_hs_dge2_summary.txt OUTPUT_LONG_FORMAT=final_hs_dge2_long.txt.gz CELL_BC_FILE=out_cell_barcode.txt

The output final_hs_dge2.txt.gz has "unexpected end of file", and the program didn't produce the summary file and long-format output.

The out_cell_barcode.txt we fed was the output after running BamTagHistogram and extract the barcode column, like this:
TTGACCTCACTT
CTATTCTCCGTC
GGCTTGTCAGAA
TGGCAAAATATC
TACCGCCGCGCC
GTATGGAGAACT

Please help us to identify the issue and advance the analysis. Thank you very much.

We are running on a cluster "Scientific Linux 7.3 (Nitrogen)".
Below is our stdout.

INFO 2018-11-13 22:58:07 DigitalExpression

********** NOTE: Picard's command line syntax is changing.


********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)


********** The command line looks like this in the new syntax:


********** DigitalExpression -I final_hs.bam -O final_hs_dge2.txt.gz -SUMMARY final_hs_dge2_summary.txt -OUTPUT_LONG_FORMAT final_hs_dge2_long.txt.gz -CELL_BC_FILE out_cell_barcode.txt


22:58:08.462 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/mpg08/mko/Tools/DropSeq/Drop-seq_tools-2.0.0/jar/lib/picard-2.18.14.jar!/com/intel/gkl/native/libgkl_compression.so
22:58:08.501 WARN NativeLibraryLoader - Unable to load libgkl_compression.so from native/libgkl_compression.so (No such file or directory)
22:58:08.502 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/mpg08/mko/Tools/DropSeq/Drop-seq_tools-2.0.0/jar/lib/picard-2.18.14.jar!/com/intel/gkl/native/libgkl_compression.so
22:58:08.502 WARN NativeLibraryLoader - Unable to load libgkl_compression.so from native/libgkl_compression.so (No such file or directory)
[Tue Nov 13 22:58:08 CET 2018] DigitalExpression SUMMARY=final_hs_dge2_summary.txt OUTPUT=final_hs_dge2.txt.gz OUTPUT_LONG_FORMAT=final_hs_dge2_long.txt.gz INPUT=final_hs.bam CELL_BC_FILE=out_cell_barcode.txt OUTPUT_READS_INSTEAD=false CELL_BARCODE_TAG=XC MOLECULAR_BARCODE_TAG=XM EDIT_DISTANCE=1 READ_MQ=10 MIN_BC_READ_THRESHOLD=0 USE_STRAND_INFO=true RARE_UMI_FILTER_THRESHOLD=0.0 GENE_NAME_TAG=gn GENE_STRAND_TAG=gs GENE_FUNCTION_TAG=gf STRAND_STRATEGY=SENSE LOCUS_FUNCTION_LIST=[CODING, UTR] VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Tue Nov 13 22:58:08 CET 2018] Executing as mko@gwdu103 on Linux 3.10.0-693.11.6.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_192-ea-b04; Deflater: Jdk; Inflater: Jdk; Provider GCS is not available; Picard version: 2.0.0(1ef3a59_1539205128)
INFO 2018-11-13 22:58:09 BarcodeListRetrieval Found 1934882 cell barcodes in file
INFO 2018-11-13 22:58:09 DigitalExpression Calculating digital expression for [1934882] cells.
22:58:12.052 WARN IntelDeflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
[Tue Nov 13 22:58:30 CET 2018] org.broadinstitute.dropseqrna.barnyard.DigitalExpression done. Elapsed time: 0.37 minutes.
Runtime.totalMemory()=2144337920
Exception in thread "main" htsjdk.samtools.util.RuntimeIOException: java.nio.file.NoSuchFileException: /broad/hptmp/mko/sortingcollection.3534001006564764904.tmp
at htsjdk.samtools.util.SortingCollection.spillToDisk(SortingCollection.java:268)
at htsjdk.samtools.util.SortingCollection.add(SortingCollection.java:183)
at org.broadinstitute.dropseqrna.utils.SortingIteratorFactory.create(SortingIteratorFactory.java:67)
at org.broadinstitute.dropseqrna.utils.readiterators.SamRecordSortingIteratorFactory.create(SamRecordSortingIteratorFactory.java:57)
at org.broadinstitute.dropseqrna.utils.readiterators.UMIIterator.(UMIIterator.java:133)
at org.broadinstitute.dropseqrna.utils.readiterators.UMIIterator.(UMIIterator.java:70)
at org.broadinstitute.dropseqrna.barnyard.DigitalExpression.digitalExpression(DigitalExpression.java:188)
at org.broadinstitute.dropseqrna.barnyard.DigitalExpression.doWork(DigitalExpression.java:154)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:295)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at org.broadinstitute.dropseqrna.cmdline.DropSeqMain.main(DropSeqMain.java:42)
Caused by: java.nio.file.NoSuchFileException: /broad/hptmp/mko/sortingcollection.3534001006564764904.tmp
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
at java.nio.file.Files.newByteChannel(Files.java:361)
at java.nio.file.Files.createFile(Files.java:632)
at java.nio.file.TempFileHelper.create(TempFileHelper.java:138)
at java.nio.file.TempFileHelper.createTempFile(TempFileHelper.java:161)
at java.nio.file.Files.createTempFile(Files.java:852)
at htsjdk.samtools.util.IOUtil.newTempPath(IOUtil.java:328)
at htsjdk.samtools.util.SortingCollection.newTempFile(SortingCollection.java:279)
at htsjdk.samtools.util.SortingCollection.spillToDisk(SortingCollection.java:250)
... 10 more

This is certainly interesting. The program is saying it can't find the temp directory to generate files. Could you try making a directory somewhere (ex: /my/temp/dir) and use the parameter TMP_DIR=/my/temp/dir ? Run the program again and let me know.

Btw, you're selecting a HUGE number of cells [1934882]. I'd guess you don't have that many cells in any experiment, and you want to revisit cell selection.

Hi James,

Yes! By adding TMP_DIR augment, DigitalExpression worked as expected! Thank you very much!
And thanks for the hint. You're right, most of the cells were essentially empty...

Hi James,

I think if no TMP_DIR is provided DigitalExpression falls back to

if [ -z "$TMPDIR" ]
then export TMPDIR=/broad/hptmp/$USER
fi

which will cause a <htsjdk.samtools.util.RuntimeIOException: java.nio.file.NoSuchFileException> runtime error on any other configuration.

commented

Hi @jopeptid ,

Thanks for pointing this out. This is a vestige of the fact that these scripts started as Broad-only tools. I'll fix this before too long, but until then, just make sure TMPDIR environment variable is set to something reasonable.

-Alec

commented

Hi @jopeptid ,

This has been fixed: #64

-Alec