rkapsi / patricia-trie

Practical Algorithm to Retrieve Information Coded in Alphanumeric (PATRICIA)

Home Page:http://code.google.com/p/patricia-trie

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Longest Prefix Match

phreed opened this issue · comments

Given a trie with keys 'abc' 'abcde' & 'abcdefgh'
I wish to have a method which when given 'abcdefg'
returns the entry whose key is 'abcde'.
That is the entry whose key is the longest prefix matching the given string.
I have to do a bit of testing but I believe the method already exists.
AbstractPatriciaTrie.getNearestEntryForKey(Object key);
but it is declared package rather than public.

Its floorEntry(K key) that gives the right answer not getNearestEntryForKey.

Example:
A trie with 'L' and 'LL'.
And you search for 'LD', with floorEntry you get 'L' while getNearestEntryForKey and selectKey return 'LL'.

floorEntry is also package protected but part of the NavigableMap interface.

Calling that function actually causes a crash, because it modifies the map. Does anybody have another function which is safe to use instead?

I implemented it like this, but I'm not sure it's right, if somebody can check my work I'm going to see about contributing it to the Apache version.

  public TrieEntry<K,V> getFloor(K key) {    
    int lengthInBits = lengthInBits(key);

    if (lengthInBits == 0) {
      if (!root.isEmpty()) {
        return root;
      } else {
        return null;
      }
    }

    TrieEntry<K, V> found = getNearestEntryForKey(key);
    if (compareKeys(key, found.key)) {
      return found;
    }

    int bitIndex = bitIndex(key, found.key);
    if (Tries.isValidBitIndex(bitIndex) || Tries.isEqualBitKey(bitIndex)) {
      return found;
    } else if (Tries.isNullBitKey(bitIndex)) {
      if (!root.isEmpty()) {
        return root;
      } else {
        return null;
      }
    }

    // we should have exited above.
    throw new IllegalStateException("invalid lookup: " + key);
  }

That code is definitely not equivalent to floorEntry(K key).

From what I remember using floorEntry didn't cause a crash, but yes it modifies the trie (so its unsafe with concurrent access).

Nice to see that people are using it. We contributed an earlier version/fork to Apache Commons and it got accepted in v4.

http://commons.apache.org/proper/commons-collections/javadocs/api-release/index.html?org/apache/commons/collections4/trie/package-summary.html

If you're looking for a general purpose (Patricia) Trie then I'd look into that.

I saw that. But I didn't see any methods for longest-prefix match.

On Sat, Mar 8, 2014 at 12:41 PM, Roger Kapsi notifications@github.comwrote:

Nice to see that people are using it. We contributed an earlier
version/fork to Apache Commons and it got accepted in v4.

http://commons.apache.org/proper/commons-collections/javadocs/api-release/index.html?org/apache/commons/collections4/trie/package-summary.html

If you're looking for a general purpose (Patricia) Trie then I'd look into
that.


Reply to this email directly or view it on GitHubhttps://github.com//issues/5#issuecomment-37109180
.

In case anyone else is wondering how to get a longest prefix match with a PatriciaTrie, the methods you are looking for are:

  • public K selectKey(final K key)
  • public V selectValue(final K key)

They are unfortunately only declared on AbstractPatriciaTrie and are not part of the general Trie interface. Also, if the desired prefix is not part of the tree these methods will return "the value whose key is closest in a bitwise XOR metric", i.e. if you are looking for an exact prefix match, you need to also compare the returned key against the search key.

In my tests, floorEntry doesn't return the longest prefix either. For example, if the trie contains com.example. and com., floorEntry("com.google.") returns com.example. rather than com.

EDIT I am talking about the code in Apache Commons (4.1), but from a quick look I think the code is the same.

floorEntry has nothing in common with longest prefix, it can be uses only for initial approximation