vigna / Sux4J

Sux4J is an effort to bring succinct data structures to Java.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can you show how to run an example?

Eilowangfang opened this issue · comments

Hi.
I have read your paper "Monotone Minimal Perfect Hashing: Searching a Sorted Table with O(1) Accesses". It is excellent work.

Can you show me some instruction of how to run an example for monotone minimal perfect hashing based on this code (e.g., compile which file)?
I have no idea you how to organize the project.

It depends, there are different techniques in the paper and I have no idea which one you have in mind.

The classes are all in the it.unimi.dsi.sux4j.mph package. For example, LcpMonotoneMinimalPerfectHashFunction builds a MMPHF based on longest common prefixes. You run it with --help and it will explain you how.

Thanks. I'm interested in the support of predecessor search (e.g., return the position of the largest key not greater than search key x).

According to my understanding, only z-fast trie supports the search (longest common prefixes cannot). I will try to understand the code example of z-fast trie (e.g., ZFastTrieDistributorMonotoneMinimalPerfectHashFuncti).

BTW. I also have two questions:

  1. Do your techniques in this paper only work for integer/binary numbers. Do they also work for a dataset with strings stored in lexicographical order?
    I noticed this because all examples shown in your paper are binary vectors. Moreover, several theorems are only workable when then data is binary vectors.

If the techniques available to strings datasets is important to me. (Although strings can be converted to binary vectors, I think the search would be slow as the number of map functions increases).

  1. Can you give me some clues about how fast the predecessor search is? A rough performance number/statis is ok (such as for 100MB dataset, one predecessor query takes 100us), from your memory of this years-ago work.

It is an excellent paper. I have read the paper for a week, but still, have trouble in totally understanding due to my weak background.
I will really appreciate if you answer the questions.

So, the paper does not talk about predecessor search. For that you might want to use a dynamic z-fast trie. That's the ZFastTrie class. If you're looking for speed, though, as you notice, the fact the everything (even integers) is first turned into a bit vector is not good. However, in my experience the number of cache misses entirely dominates the lookup time, so you might want to give it a try even in this form.

Note that z-fast tries are competitive WRT, say, binary trees, only for very large data sets.

If you are interested in predecessor search on static sets you could put together a static z-fast trie to that purpose, but I don't have any code available.

Thanks for your comments and suggestions.

I appreciate your help. BTW, if you need some medical stuff such as masks, I might help you (for free). I'm in China HK. Masks supply is sufficient now.
Feel sorry about the situation of COVID-19 in Italy. I would like to help if I could.
Be safe.