Other Landau-Vishkin implementations

Question

Other Landau-Vishkin implementations

RagnarGrootKoerkamp opened this issue 3 years ago · comments

Ragnar Groot Koerkamp commented 3 years ago

Are you aware of any other competitive / comparable in speed implementation of the basic Landau-Vishkin algorithm that WFA extends?

I tried looking for one, but couldn't find any, and since you also don't compare to them, this may not exist?

Santiago Marco-Sola · Answer 1 · Tue Mar 22 2022 21:33:24 GMT+0800 (China Standard Time)

Not sure what are you looking for. I understand that you are not looking for classical pairwise alignment methods, but diagonal-transition algorithms (like landau-vishkin, OND, and WFA). Right?

In that case, I might suggest looking into lv89 from @lh3, wfalm from @jeizenga, and DALIGNER from G.Myers.

I hope this helps.
Cheers,

Ragnar Groot Koerkamp · Answer 2 · Wed Mar 23 2022 19:41:08 GMT+0800 (China Standard Time)

The reason I asked is because I was expecting a comparison to one such tool in your paper, since this is the algorithm you extend. The fact that you did not include one seems to indicate that no (competitive) implementation of the diagonal-transition method was available at the time.

lv89 seems to be a toy project (and is very recent)
wfalm I may include, but from their experiments they seem slower than normal WFA (although using less memory may be useful, as WFA does run out of memory on our largers tests. I'll also run WFA2 with the less memory options.)
DALIGNER seems to be mostly for local alignments.

Anyway, I just noticed the sentence saying:

We discarded other methods from the evaluation as their running time was exceedingly long or because their recall was substantially below par.

So I suppose that answers my question :)

Santiago Marco-Sola · Answer 3 · Wed Mar 23 2022 20:13:21 GMT+0800 (China Standard Time)

Agreed.

I'll also run WFA2 with the less memory options.

Yes, please!

Let me know if you have any other questions or remarks. These are highly appreciated.

Cheers,

Heng Li · Answer 4 · Wed Mar 23 2022 23:52:00 GMT+0800 (China Standard Time)

I forwarded @RagnarGrootKoerkamp to WFA2. As I was mentioned here, I need to clarify that lv89 is spinoff of my gwfa. I said it is a toy due to its simplicity. It is fairly efficient, though probably not as efficient as WFA2. ~~I failed to get the correct output from WFA2. Will create a separate issue for that.~~

PS: the wrong result was due to enabling heuristics. It has been fixed on my end. See #7.

Santiago Marco-Sola · Answer 5 · Thu Mar 24 2022 00:53:19 GMT+0800 (China Standard Time)

I forwarded @RagnarGrootKoerkamp to WFA2.

Thanks for the forwarding. Much appreciated.

[...] I need to clarify that lv89 is spinoff of my gwfa. I said it is a toy due to its simplicity. It is fairly efficient, though probably not as efficient as WFA2.

Well, I think it is a good idea. Simplicity is many times preferred over complex and hard-to-integrate tools/libraries. I would really love to offer the WFA2 in just a header, but obviously, I can't.

I failed to get the correct output from WFA2. Will create a separate issue for that.

Please, do.

Jordan Eizenga · Answer 6 · Thu Mar 24 2022 01:17:48 GMT+0800 (China Standard Time)

Perhaps of interest: I implemented an adaptive version of the low memory WFA algorithm in wfalm that decides between three WFA variants on-line, which should have largely eliminated the run time difference between wfalm's implementations of standard and low-memory WFA. That said, I have neither rigorously benchmarked it nor compared the speed to implementations in other repositories.

Ragnar Groot Koerkamp · Answer 7 · Thu Mar 24 2022 02:29:12 GMT+0800 (China Standard Time)

@jeizenga I may also include wfalm in my benchmark. I didn't look much into your repo yet though (it would be simpler if you provide something similar to the tools/align-benchmark in this repo ;).

Also, now that we're all here: would you be interested in a slack/discord where we could chat more? After I get the preprint of my own aligner out, I'd love to collaborate.

Jordan Eizenga · Answer 8 · Thu Mar 24 2022 03:31:02 GMT+0800 (China Standard Time)

I'd be happy to chat further :)

In the meantime, I do have a benchmarking utility in the wfalm repository (in the test directory), although it is very rudimentary compared to @smarco's. Maybe still useful though.

Santiago Marco-Sola · Answer 9 · Thu Mar 24 2022 15:31:59 GMT+0800 (China Standard Time)

Sure, happy to keep on talking.

Kristoffer · Answer 10 · Thu Mar 24 2022 18:01:46 GMT+0800 (China Standard Time)

Well, I think it is a good idea. Simplicity is many times preferred over complex and hard-to-integrate tools/libraries. I would really love to offer the WFA2 in just a header, but obviously, I can't.

Is a header-only solution possible but requires significant work, or are there other factors preventing this?

Santiago Marco-Sola · Answer 11 · Thu Mar 24 2022 19:06:22 GMT+0800 (China Standard Time)

Trying to answer the question properly, you can always put all the WFA2 sources in a single .h/.hpp file. This can be done easily. However, I am reluctant to merge >10K lines of code together in a single file. First, I need to maintain & debug this code on my own; I need it very structured and modularized to reduce the complexity as much as possible. Second, I think that compiling against a library, using API/bindings (C/C++/Python), is attainable for most programmers.

So, I think it is not a matter of offering a single file but producing a simple code (i.e., a few lines) that can do the job on a single header. In fact, I have a reduced version of the WFA-edit (a single C file) that I usually use for educational purposes. That is why I think that lv89 from @lh3 is a good idea; because it is simple and easy to understand.

I hope this clarifies the question.

Kristoffer · Answer 12 · Thu Mar 24 2022 19:11:44 GMT+0800 (China Standard Time)

Yes, it very much does. Thank you for the detailed answer!