broadinstitute / Drop-seq

Java tools for analyzing Drop-seq data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TagBamWithReadSequenceExtended Error:reads not properly paired

dwkarjosukarso opened this issue · comments

I ran the following command and got an error message that one read was not properly paired.

/home/karjosukarso/Drop-seq_tools-2.3.0/Drop-seq_alignment.sh -g /home/karjosukarso/Nadia_DK43/STAR/ -r /home/karjosukarso/Nadia_DK43/hg19.fasta.gz -o /home/karjosukarso/DK-51/ALIGNED_INV_BAM/ -s /home/karjosukarso/anaconda3/envs/Dropseq/bin/STAR /home/karjosukarso/DK-51/INV_BAM/INV_DK-51-mRNA-57.bam
Using temporary directory /tmp/tmp.KQGeoD3jv9
INFO 2019-10-06 14:18:26 TagBamWithReadSequenceExtended

********** NOTE: Picard's command line syntax is changing.


********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)


********** The command line looks like this in the new syntax:


********** TagBamWithReadSequenceExtended -SUMMARY /home/karjosukarso/DK-51/ALIGNED_INV_BAM//unaligned_tagged_Cellular.bam_summary.txt -BASE_RANGE 1-12 -BASE_QUALITY 10 -BARCODED_READ 1 -DISCARD_READ false -TAG_NAME XC -NUM_BASES_BELOW_QUALITY 1 -INPUT /home/karjosukarso/DK-51/INV_BAM/INV_DK-51-mRNA-57.bam -OUTPUT /tmp/tmp.KQGeoD3jv9/unaligned_tagged_Cell.bam


14:18:26.865 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/karjosukarso/Drop-seq_tools-2.3.0/jar/lib/picard-2.18.14.jar!/com/intel/gkl/native/libgkl_compression.so
14:18:26.878 WARN NativeLibraryLoader - Unable to load libgkl_compression.so from native/libgkl_compression.so (No such file or directory)
14:18:26.880 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/karjosukarso/Drop-seq_tools-2.3.0/jar/lib/picard-2.18.14.jar!/com/intel/gkl/native/libgkl_compression.so
14:18:26.880 WARN NativeLibraryLoader - Unable to load libgkl_compression.so from native/libgkl_compression.so (No such file or directory)
[Sun Oct 06 14:18:26 CEST 2019] TagBamWithReadSequenceExtended INPUT=/home/karjosukarso/DK-51/INV_BAM/INV_DK-51-mRNA-57.bam OUTPUT=/tmp/tmp.KQGeoD3jv9/unaligned_tagged_Cell.bam SUMMARY=/home/karjosukarso/DK-51/ALIGNED_INV_BAM/unaligned_tagged_Cellular.bam_summary.txt BASE_RANGE=1-12 BARCODED_READ=1 DISCARD_READ=false BASE_QUALITY=10 NUM_BASES_BELOW_QUALITY=1 TAG_NAME=XC TAG_BARCODED_READ=false HARD_CLIP_BASES=false TAG_QUALITY=XQ VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Sun Oct 06 14:18:26 CEST 2019] Executing as karjosukarso@mb07 on Linux 3.17.8-gentoo-r1 amd64; OpenJDK 64-Bit Server VM 11.0.1-internal+0-adhoc..src; Deflater: Jdk; Inflater: Jdk; Provider GCS is not available; Picard version: 2.3.0(34e6572_1555443285)
14:18:26.894 WARN IntelDeflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
14:18:26.923 WARN IntelDeflaterFactory - IntelDeflater is not supported, using Java.util.zip.Deflater
INFO 2019-10-06 14:18:34 TagBamWithReadSequenceExtended Processed 1,000,000 records. Elapsed time: 00:00:07s. Time for last 1,000,000: 7s. Last read position: /
INFO 2019-10-06 14:18:40 TagBamWithReadSequenceExtended Processed 2,000,000 records. Elapsed time: 00:00:13s. Time for last 1,000,000: 5s. Last read position: /
INFO 2019-10-06 14:18:46 TagBamWithReadSequenceExtended Processed 3,000,000 records. Elapsed time: 00:00:19s. Time for last 1,000,000: 6s. Last read position: /
INFO 2019-10-06 14:18:53 TagBamWithReadSequenceExtended Processed 4,000,000 records. Elapsed time: 00:00:26s. Time for last 1,000,000: 6s. Last read position: /
INFO 2019-10-06 14:18:58 TagBamWithReadSequenceExtended Processed 5,000,000 records. Elapsed time: 00:00:31s. Time for last 1,000,000: 5s. Last read position: /
INFO 2019-10-06 14:19:04 TagBamWithReadSequenceExtended Processed 6,000,000 records. Elapsed time: 00:00:37s. Time for last 1,000,000: 5s. Last read position: /
INFO 2019-10-06 14:19:09 TagBamWithReadSequenceExtended Processed 7,000,000 records. Elapsed time: 00:00:42s. Time for last 1,000,000: 5s. Last read position: /
INFO 2019-10-06 14:19:14 TagBamWithReadSequenceExtended Processed 8,000,000 records. Elapsed time: 00:00:47s. Time for last 1,000,000: 5s. Last read position: /
INFO 2019-10-06 14:19:19 TagBamWithReadSequenceExtended Processed 9,000,000 records. Elapsed time: 00:00:53s. Time for last 1,000,000: 5s. Last read position: /
INFO 2019-10-06 14:19:25 TagBamWithReadSequenceExtended Processed 10,000,000 records. Elapsed time: 00:00:58s. Time for last 1,000,000: 5s. Last read position: /
INFO 2019-10-06 14:19:30 TagBamWithReadSequenceExtended Processed 11,000,000 records. Elapsed time: 00:01:03s. Time for last 1,000,000: 5s. Last read position: /
INFO 2019-10-06 14:19:35 TagBamWithReadSequenceExtended Processed 12,000,000 records. Elapsed time: 00:01:08s. Time for last 1,000,000: 5s. Last read position: /
INFO 2019-10-06 14:19:40 TagBamWithReadSequenceExtended Processed 13,000,000 records. Elapsed time: 00:01:13s. Time for last 1,000,000: 5s. Last read position: /
INFO 2019-10-06 14:19:45 TagBamWithReadSequenceExtended Processed 14,000,000 records. Elapsed time: 00:01:19s. Time for last 1,000,000: 5s. Last read position: /
INFO 2019-10-06 14:19:51 TagBamWithReadSequenceExtended Processed 15,000,000 records. Elapsed time: 00:01:24s. Time for last 1,000,000: 5s. Last read position: /
INFO 2019-10-06 14:19:56 TagBamWithReadSequenceExtended Processed 16,000,000 records. Elapsed time: 00:01:29s. Time for last 1,000,000: 5s. Last read position: /
INFO 2019-10-06 14:20:01 TagBamWithReadSequenceExtended Processed 17,000,000 records. Elapsed time: 00:01:34s. Time for last 1,000,000: 5s. Last read position: /
INFO 2019-10-06 14:20:06 TagBamWithReadSequenceExtended Processed 18,000,000 records. Elapsed time: 00:01:39s. Time for last 1,000,000: 5s. Last read position: /
INFO 2019-10-06 14:20:11 TagBamWithReadSequenceExtended Processed 19,000,000 records. Elapsed time: 00:01:44s. Time for last 1,000,000: 5s. Last read position: /
ERROR 2019-10-06 14:20:14 TagBamWithReadSequenceExtended Reads not properly paired! R1: NS500173:533:HHFJCBGXC:4:11401:10003:1098 R2: NS500173:533:HHFJCBGXC:4:11401:10003:1098

I tried sorting the fastq file or use TrimGalore, but the error persists. I also tried to remove the specific read, but then the same error appear but with another read. I also tried to run the TagBamWithReadSequenceExtended function separately from the bash script, but also no luck. I check the samtools flagstat, but no difference with the files that I previously tried and successfully made count tables for.

I would appreciate your input in this.

Best wishes,
Dyah Karjosukarso

commented

Hi Dyah,

I'm puzzled. The read names for R1 and R2 match. Looking at the code, I would expect an additional error message with more detail about the problem. Could you send the SAM records for the readname mentioned in the error message? You can generate this with samtools, e.g.

samtools view /home/karjosukarso/DK-51/INV_BAM/INV_DK-51-mRNA-57.bam | grep NS500173:533:HHFJCBGXC:4:11401:10003:1098

or with Picard:

jar -jar /home/karjosukarso/Drop-seq_tools-2.3.0/3rdParty/picard/picard.jar ViewSam -I /home/karjosukarso/DK-51/INV_BAM/INV_DK-51-mRNA-57.bam | grep NS500173:533:HHFJCBGXC:4:11401:10003:1098

Note that there are a few additional command line arguments Picard needs. I forget what they are but it should be obvious.

Regards, Alec

Hi Alec,
thank you for your quick response. The SAM records for the readname mentioned is below:

NS500173:533:HHFJCBGXC:4:11401:10003:1098 77 * 0 0 * * 0 0 AACTTCGGACTTATCCAGATTTTTTTTTTTTTTTTTTT AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE RG:Z:A
NS500173:533:HHFJCBGXC:4:11401:10003:1098 77 * 0 0 * * 0 0 AACTTCGGACTTATCCAGATTTTTTTTTTTTTTTTTTT AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE RG:Z:A
NS500173:533:HHFJCBGXC:4:11401:10003:1098 141 * 0 0 * * 0 0 GTGTGTGCAAGGCACAGAACACCTGGGGCTGCGGGAAC AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEA RG:Z:A
NS500173:533:HHFJCBGXC:4:11401:10003:1098 141 * 0 0 * * 0 0 GTGTGTGCAAGGCACAGAACACCTGGGGCTGCGGGAAC AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEA RG:Z:A

Best,
Dyah

commented

Hi Dyah,

I'm not sure how this happened, but as you can see, you have 2 copies of read 1 and two copies of read 2. I don't know why there isn't a clearer error message. Looking at the code, you should see a message in addition to you one you are seeing that says:

" passed as 1st read of pair, should be the second"

I believe if you run Picard ValidateSamFile it will report this problem. I think you need to look back a the process that produced the input BAM, and try to figure out what introduced the duplication.

BTW, you can use Explain SAM Flags to interpret the second field in each record (77 and 141).

Regards, Alec