yinqunjun / Ferret

An optimized substring search engine written in Go

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ferret: An optimized substring search engine written in Go.
Makes use of a combination of an Inverted Index and a Suffix Array to allow log-time lookups with a relatively small memory footprint.
Also incorporates error-correction (Levenshtein distance 1) and simple Unicode-to-ASCII conversion.
Allows for arbitrary sorting functions
Allows you to map arbitrary data to your results, and quickly update this data.

License: CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0/)
You are free to share and remix this work, as long as you attribute the original author, and share it under a similar license
Author: Mark Canning

To Install: go get github.com/argusdusty/Ferret
To Update: go get -u github.com/argusdusty/Ferret
To Use: import "github.com/argusdusty/Ferret"
To try the examples: go build example.go && go build dictionaryexample.go

Check out example.go and dictionaryexample.go for example usage. The code is meant to be as fast as possible for a substring dictionary search, and as such is best suited for large dictionaries with >1,000,000 total characters. I've timed 10s initialization for 3.5 million characters (350,000 entries) on a modern CPU, and 15us search time (4000us with error-correction).

Uses linear memory (approximately 10 times the size of the dictionary)
Searches performed in log time with the number of characters in the dictionary.
Sorted searches can be slow, taking ~linear time with the number of matches, rather than linear time with the results limit.

About

An optimized substring search engine written in Go