ChaissonLab / danbing-tk

Toolkit for VNTR genotyping and repeat-pan genome graph construction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

danbing-tk build doesn't appear to work in v1.3

ASLeonard opened this issue · comments

I'm running the build pipeline using several assemblies, currently just trying it without the prune option, so my genome.bam.tsv looks like

OxO
BSW3
BSW4

The pipeline goes ahead fine until the GenPanGenomeGraph step, where it segfaults almost immediately.
It looks like the main issue is there are "0 loci" in my data when loading, and upon inspecting the ..kmers files are all just a single column list of numbers as far as I can tell. This doesn't match the listed file format like

>locus i
kmer0	kmer_count0

so I am assuming something has gone wrong causing there to be no loci found. However, in the stderr of the GenRawGenomeGraph there is "Using orthology map, total number of loci: 13703".

I'm currently rerunning the pipeline using v1.1 to see if it is a recent change, so will update on that later.

The pipeline appears to work fine under the tag "manuscript-1" (908d6b5).

Thanks for catching the bug and coming up with a quick workaround. This appears to be the same issue as #15. Unfortunately I won't be able to get back to this issue until later this week. In the meantime, it should be safe to use the RPGG you have in combination with danbing-tk v1.3 for VNTR genotyping.

No rush, I'm just testing the rpgg out for future work on a small test set.

I'm not sure this is the same issue. It is hard to say since the that log is so malformed, but that looks like an error in * GenRawGenomeGraph*, probably related to pruning or an earlier version as a lot of the steps don't look familiar (bam2pe, bam2pe, etc.)

My segfault seems to be from the function mapKmersFile2DB in genPanKmers.cpp, probably because the line splitting leads to undefined behaviour as the file isn't properly formatted.

Sorry for the late reply.

I just updated the example configuration file pipeline/goodPanGenomeGraph.json. Could you edit accordingly and let me know if it works? It seems to me that your pipeline is still running the pruning step. I might need log files to look into it if you still get errors.

Same issue, and the pruning step wasn't being run previously or now.
An example output for a sample.rawPB.tr.kmers is

1
1
1
1
1

While in the v1.0 which does work the same file looks like

>0
517855109956    1
1023086787026   1
2262966179741   1
629512815861    1

So the issue appears to be related to writeKmers, which between the working and broken versions was changed to writeKmersWithName. There is an -on flag in the query program, but not in the VNTR one, which is causing the issue.

I haven't extensively tested it, but pretty sure #18 fixes this behaviour. There may be other pipelines or parameters which will need other changes, but this at least finishes building without pruning on v1.3.

Thanks for keeping the pruning step from being obsolete. I tended to skip that step when using HGSVC and HPRC assemblies. I'm also planning to check if pruning preserves path contiguity so that reads can be aligned reliably in the threading step