danbing-tk build doesn't appear to work in v1.3

Question

danbing-tk build doesn't appear to work in v1.3

ASLeonard opened this issue 2 years ago · comments

I'm running the build pipeline using several assemblies, currently just trying it without the prune option, so my genome.bam.tsv looks like

OxO
BSW3
BSW4

The pipeline goes ahead fine until the GenPanGenomeGraph step, where it segfaults almost immediately.
It looks like the main issue is there are "0 loci" in my data when loading, and upon inspecting the ..kmers files are all just a single column list of numbers as far as I can tell. This doesn't match the listed file format like

>locus i
kmer0	kmer_count0

so I am assuming something has gone wrong causing there to be no loci found. However, in the stderr of the GenRawGenomeGraph there is "Using orthology map, total number of loci: 13703".

I'm currently rerunning the pipeline using v1.1 to see if it is a recent change, so will update on that later.

Alex Leonard · Answer 1 · Tue Mar 08 2022 21:05:15 GMT+0800 (China Standard Time)

The pipeline appears to work fine under the tag "manuscript-1" (908d6b5).

Tsung-Yu Lu · Answer 2 · Wed Mar 09 2022 02:36:40 GMT+0800 (China Standard Time)

Thanks for catching the bug and coming up with a quick workaround. This appears to be the same issue as #15. Unfortunately I won't be able to get back to this issue until later this week. In the meantime, it should be safe to use the RPGG you have in combination with danbing-tk v1.3 for VNTR genotyping.

Alex Leonard · Answer 3 · Wed Mar 09 2022 04:35:17 GMT+0800 (China Standard Time)

No rush, I'm just testing the rpgg out for future work on a small test set.

I'm not sure this is the same issue. It is hard to say since the that log is so malformed, but that looks like an error in * GenRawGenomeGraph*, probably related to pruning or an earlier version as a lot of the steps don't look familiar (bam2pe, bam2pe, etc.)

My segfault seems to be from the function mapKmersFile2DB in genPanKmers.cpp, probably because the line splitting leads to undefined behaviour as the file isn't properly formatted.

Tsung-Yu Lu · Answer 4 · Thu Mar 31 2022 09:09:15 GMT+0800 (China Standard Time)

Sorry for the late reply.

I just updated the example configuration file pipeline/goodPanGenomeGraph.json. Could you edit accordingly and let me know if it works? It seems to me that your pipeline is still running the pruning step. I might need log files to look into it if you still get errors.

Alex Leonard · Answer 5 · Thu Mar 31 2022 17:32:46 GMT+0800 (China Standard Time)

Same issue, and the pruning step wasn't being run previously or now.
An example output for a sample.rawPB.tr.kmers is

While in the v1.0 which does work the same file looks like

>0
517855109956    1
1023086787026   1
2262966179741   1
629512815861    1

Alex Leonard · Answer 6 · Thu Mar 31 2022 18:15:25 GMT+0800 (China Standard Time)

So the issue appears to be related to writeKmers, which between the working and broken versions was changed to writeKmersWithName. There is an -on flag in the query program, but not in the VNTR one, which is causing the issue.

I haven't extensively tested it, but pretty sure #18 fixes this behaviour. There may be other pipelines or parameters which will need other changes, but this at least finishes building without pruning on v1.3.

Tsung-Yu Lu · Answer 7 · Sat Mar 04 2023 03:25:53 GMT+0800 (China Standard Time)

Thanks for keeping the pruning step from being obsolete. I tended to skip that step when using HGSVC and HPRC assemblies. I'm also planning to check if pruning preserves path contiguity so that reads can be aligned reliably in the threading step