rurban / marisa-trie

marisa-trie

Home Page:https://code.google.com/p/marisa-trie/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Project name

marisa-trie

New: https://github.com/rurban/marisa-trie

Old: http://code.google.com/p/marisa-trie/

Project summary

MARISA: Matching Algorithm with Recursively Implemented StorAge

Version

0.2.4

Description

Matching Algorithm with Recursively Implemented !StorAge (MARISA) is a static and space-efficient trie data structure. And libmarisa is a C++ library to provide an implementation of MARISA. Also, the package of libmarisa contains a set of command line tools for building and operating a MARISA-based dictionary.

A MARISA-based dictionary supports not only lookup but also reverse lookup, common prefix search and predictive search.

  • Lookup is to check whether or not a given string exists in a dictionary.
  • Reverse lookup is to restore a key from its ID.
  • Common prefix search is to find keys from prefixes of a given string.
  • Predictive search is to find keys starting with a given string.

The biggest advantage of libmarisa is that its dictionary size is considerably more compact than others. See below for the dictionary size of other implementations.

  • Input
  • Source: enwiki-20121101-all-titles-in-ns0.gz
  • Contents: all page titles of English Wikipedia (Nov. 2012)
  • Number of keys: 9,805,576
  • Total size: 200,435,403 bytes (plain) / 54,933,690 bytes (gzipped)
Implementation Size (bytes) Remarks
darts-clone 376,613,888 Compacted double-array trie
tx-trie 127,727,058 LOUDS-based trie
marisa-trie 50,753,560 MARISA trie

Version control system

git

(formerly Subversion on google-code)

Source code license

  • The BSD 2-clause License
  • The LGPL 2.1 or any later version

Project labels

  • Patricia
  • Trie
  • Static
  • Dictionary
  • CPlusPlus

See also

https://github.com/s-yata/marisa2

About

marisa-trie

https://code.google.com/p/marisa-trie/

License:Other


Languages

Language:C++ 47.1%Language:Shell 43.7%Language:Makefile 7.4%Language:Python 0.6%Language:Perl 0.5%Language:M4 0.3%Language:C 0.3%Language:Ruby 0.1%