1e0ng / simhash

A Python Implementation of Simhash Algorithm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Serialization and deserialization support

731935354 opened this issue · comments

commented

Hi,
There is no doubt that this project helps me a lot when I try to do deduplication on 130,000 wiki docs. However, it makes me headache when I try to re-run the simhash building process even with multiprocessing. Is there anyone who plan to add serialization and deserialization support in order to save time?

It seems that ZODB may be a proper backend tool.(Certainly I will have a try.)

commented

Thanks, Zhu. Have you tried out ZODB? How does it look like?