ChaissonLab / danbing-tk

Toolkit for VNTR genotyping and repeat-pan genome graph construction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

danbing-tk align empty *.kmer output

cappadoc opened this issue · comments

I conducted danbing-tk align with my short read-based bam file, which had been aligned against the GRCh37 reference using bwa mem.
The command, in the working directory containing the presence of the RPGG pan files, completed without any errors, but the generated kmers output file (NA12878.tr.kmers) was empty and the file size of aln.gz file was only 1.3 MB. Is there any solution for this trouble?

My command and output on the console are the follows:

samtools fasta -@ 2 -n NA12878.rmdup.bam | danbing-tk -gc 80 -ae -kf 4 1 -cth 45 -o NA12878 -k 21 -qs pan -fai /dev/stdin -p 6 | gzip > NA12878.aln.gz

use baitDB: 0
extract fasta: 0
interleaved: 1
sim mode: 0
trim mode: 0
augmentation mode: 0
graph threading mode: 1
output alignment: 1
output successfully aligned reads only: 1
k: 21
#of subsampled kmers in pre-filtering: 4
minimal # of matches in pre-filtering: 1
Cthreshold: 45
Rthreshold: 0.5
threading Cthreshold: 80
Running both step1 (kmer-based filtering) and step2 (threading)
fastx: /dev/stdin
query: pan.(tr/ntr).kmers

total number of loci in pan.tr.kmers: 80518
deserializing kmerDBi.umap
deserializing kmerDBi.vv
deserializing graph.umap
reading kmers from pan.tr.kmers
deserialized graph/index and read tr.kmers in 127 sec.
#unique kmers in kmerDBi: 147568763
creating data for each process...
threads created
Buffered reading 300000 300000 0
Buffered reading 300000 600000 0
Buffered reading 300000 900000 0
Buffered reading 300000 1200000 0
Buffered reading 300000 1500000 0
Buffered reading 300000 1800000 0

Thank you for your help.

Hi! Do you have an example of what the output of samtools fasta -@ 2 -n NA12878.rmdup.bam looks like?

It looks like the following:

>A00217:76:HFLT3DSXX:4:2608:14073:14810
AGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGATATCGGAAGAGCTTCGTGTAGGGAAAGAGTGTTAATTAGTGTGTAGTTATCGGATTTCGAAATATAAATAAAAAAATTAAATAAAAGAGAAAAAGTAATTATA
>A00296:43:HCLHLDSXX:3:1140:23457:10520
AGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTAAGGGTTAGGGTTAGGGTTATCGGAAGAGAACACGTAAGAACTCCAGTAAGTTGCCTGGATCGGGTAAGCGGTCTGGTGCGGGTGAGGGGTGGGGGT
>A00217:77:HFJWFDSXX:3:1215:18774:33411
GGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGATAGGGATATCGGAAGAGAAAACTAAAGAAATCCAGTAACTTTACTAGATATCGTAATGCGTCTTGTGCTTGTAATGGGTGGGGGGGGGGGGGG
>A00217:77:HFJWFDSXX:3:1358:27046:24471
AGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGATATCGGAAGAGAAAACGACAGAAATCCAGTCAAATTACTAGATCGCGTAAGACGACTTCTGCTAGAAAAGGGTGGGGGGGGGGGGGGGGGG
>A00217:77:HFJWFDSXX:4:2669:5828:29371
AGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTATCGGAAGAGAAATCTTAAGAGAAACTGTCAGGGTACTAGATAGCGTAAGGCTTCTTGTGAGTGAT
>A00217:77:HFJWFDSXX:3:2201:16233:34992
GGGTTAGGGTTAGGGGTAGGGTTAGGGTTAGGGTTATCGGAAGTGATCACGTCAGAAGTACAGTAACTTTGCTTGTTCGGGTTAGGGGTCTTGTAGGGTAAGGGTAGGGGGTGGGGTCGGGGTAGGGTGGGGGGTGGGGGTGGGGGTGGG
.
.
The bam has 675776582 reads in total.
I got a successful result when using the sample bam file (HG00514.filtered.reads.bam) contained in the danbing-tk package.

The reads look fine to me at first glance. Just checking -- is this sample paired-end sequencing? The read names of a read pair have to be the same for danbing-tk to map paired-end reads.

Yes, this data is paired-end reads with the same name for each pair, and the bam is sorted by coordinate.
For example, reads of a pair are output as follows:

>A00296:43:HCLHLDSXX:3:2254:22779:3521
CTTGCTTACTGTATAGTGGTGGCACGCCGCCTGCTGGCAGCTAGGGACATTGCAGGGTCCTCTTGCTCAAGGTGTAGTGGCAGCACGCCCACCTGCTGGCAGCTGGGGACACTGCCGGGCCCTCTTGCTCCAACAGTACTGGCGGATTAT
.
.
>A00296:43:HCLHLDSXX:3:2254:22779:3521
CTCCAGGGCTTCACCTGTTAGACTGGCAAAAATCCAAAAGTAAACACTTTGTGGAGAAACAGGCATTCCTAGACATTGCTGGTGGGATACAGAACAGTACAATTCTGATGGTAATCAGTTCACAAATTAAACATATTTATTTTTTACTTT

Hi, sorry for the huge delay on following up. Have you figured out the reason why the mapping is not working? Let me know if I can help with debugging. Feel free to drop a test bam file here. -Tony

I reinstalled the latest version and run the job. The NA12878.tr.kmers and NA12878.aln.gz files seemed to be normally created. Thank you for your help.