PNNL-CompBio / Snekmer

Pipeline to apply encoded Kmer analysis to protein sequences

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Memory usage

biodataganache opened this issue · comments

Snekmer model still uses a lot of memory for fairly straightforward jobs, even after the dev-em fix. This is due to some duplication in data in the score and model rules that can probably be cleaned up pretty easily. It's also because ALL the kmer matrices are loaded in to memory for each rule for every thread. This causes a lot of memory usage. I think it can be addressed in several ways. I'm starting another issue to handle one important enhancement that would accomplish this (but also do more).

Fixed issue by revising how vectorize rule works in a couple of ways. The search rule is still somewhat bloated - but it seems to work OK and doesn't use enormous amounts of memory.