Help: DSK configuration
taranglute opened this issue · comments
Hi
We are working on analysis of Bioinformatics tools (related to Kmer counting) and DSK is one of them. We have gone through readme file and it is very helpful. As we are doing analysis so we want to be very sure about details. It would be great if you help us validating below details.
Data structure and Sorting Algo: Hash Table/ Hashing
Approach: Two Disk based
The limit of k-size : Arbitrary large k-mer lengths (any ideal length)
Supports online k-mer frequency retrieval : No
Supports compressed file processing : Yes
Thanks
Tarang
Hello,
The method is indeed disk-based (but not necessarily two disks). We use the minimizer approach to partition kmers and write superkmers to the disk. Counting is then done preferably by sorting (with a kind of radix sort), with a fallback to counting with a hash table if some bucket is too large to fit in ram.
Data structure and Sorting Algo: Hash Table and radix sort
Approach: Disk based
Rest is correct.
Guillaume