GATB / dsk

k-mer counting software

Home Page:https://gatb.inria.fr/software/dsk/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Help: DSK configuration

taranglute opened this issue · comments

commented

Hi

We are working on analysis of Bioinformatics tools (related to Kmer counting) and DSK is one of them. We have gone through readme file and it is very helpful. As we are doing analysis so we want to be very sure about details. It would be great if you help us validating below details.

Data structure and Sorting Algo: Hash Table/ Hashing
Approach: Two Disk based
The limit of k-size : Arbitrary large k-mer lengths (any ideal length)
Supports online k-mer frequency retrieval : No
Supports compressed file processing : Yes

Thanks
Tarang

Hello,

The method is indeed disk-based (but not necessarily two disks). We use the minimizer approach to partition kmers and write superkmers to the disk. Counting is then done preferably by sorting (with a kind of radix sort), with a fallback to counting with a hash table if some bucket is too large to fit in ram.

Data structure and Sorting Algo: Hash Table and radix sort
Approach: Disk based
Rest is correct.

Guillaume