GaetanBenoitDev / metaMDBG

MetaMDBG: a lightweight assembler for long and accurate metagenomics reads.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Changing parameters for metaMDBG polish

jadeaver opened this issue · comments

I was trying to use Racon but encountering memory issues (even on my University HPC) when I came across metaMDBG and its stand alone polisher. It's a great tool and I have already gotten it to run successfully without the memory issues I was encountering with Racon - awesome!! Thank you!

My questions -

  1. I am polishing an ONT long read assembly with Illumina short reads. I noticed however that this polisher uses the minimap2 parameter -x map-hifi. I installed metaMDBG from source (using conda) and edited the file ContigPolisher.hpp to replace -x map-hifi with -x map-ont. When I run the polisher, the log file still indicates the minimap2 is used with -x map-hifi. I am wondering if I edited the right script or if there is another script I would need to edit to change this parameter?

  2. I am also noticing I have the same number of contigs after polishing, but ~40% fewer basepairs and fewer long contigs (see below). I know this can be normal with Racon because the unpolished contigs aren't returned, but Racon has the -u option to include unpolished. Is there a similar flag for this polisher?

contigs length | before | after
contigs (>= 0 bp) | 35804 | 35798
contigs (>= 1000 bp) | 35718 | 35442
contigs (>= 5000 bp) | 34601 | 27176
contigs (>= 10000 bp) | 25810 | 16420
contigs (>= 25000 bp) | 10904 | 5982
contigs (>= 50000 bp) | 4117 | 1705
Total length (>= 0 bp) | 1.02E+09 | 5.67E+08

The polisher options are a bit limited, and I won't be able to update it soon unfortunately. Editing the minimap2 command should work, if you compile from source, you need to use this executable then: ./bin/metaMDBG in the build folder.

If you polish with short reads, you should use option -x sr of minimap2 instead, but just note that the polisher won't handle short reads optimally (because it won't use the paired information of short reads).