kasraavand / SuffixTree

Optimized implementation of suffix tree in python using Ukkonen's algorithm.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This module is an optimized implementation of Ukkonen's suffix tree algorithm in python which will be having most of the important text processing functionalities such as:

Search for strings:

Check if a string P of length m is a substring in O(m) time.
Find the first occurrence of the patterns P1,... ,Pq of total length m as substrings in O(m) time.

Find all z occurrences of the patterns P1,... ,Pq of total length m as substrings in O(m+z) time.

  • Search for a regular expression P in time expected sublinear in n
  • Find for each suffix of a pattern P the length of the longest match between a prefix of P[i... m] and a substring in D in image time. This is termed the matching statistics for P

Find properties of the strings:

  • Find the longest common substrings of the string Si and Sj in image time.
  • Find all maximal pairs, maximal repeats or supermaximal repeats in image time.
  • Find the Lempel–Ziv decomposition in image time.[10]
  • Find the longest repeated substrings in image time.
  • Find the most frequently occurring substrings of a minimum length in image time.
  • Find the shortest strings from image that do not occur in D in O(n+z) time, if there are z such strings.
  • Find the shortest substrings occurring only once in image time.
  • Find, for each i the shortest substrings of Si not occurring elsewhere in D in image time.

sources:

About

Optimized implementation of suffix tree in python using Ukkonen's algorithm.


Languages

Language:Python 100.0%