epinal / pybkt

Python3 Implementation of Burkhard-Keller tree.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pybkt

pybkt is a pure python3 implementation of Burkhard-Keller tree data structure.

This implementation is based on:

Both lookup methods from above implementations are included:

  • query(): recursive
  • find(): not recursive

Example usage:

Create the tree using Linux English words dictionary and levenshtein distance function.

    >>> tree = BKTree(levenshtein, dict_words())    
    >>> tree.query('book', 1)  # Query the tree to find words at most 1 distance from 'book'
    [(0, 'book'), (1, 'Cook'), (1, 'boo'), (1, 'boob'), (1, 'books'), (1, 'boom'), (1, 'boon'), (1, 'boor'), (1, 'boos'), (1, 'boot'), (1, 'brook'), (1, 'cook'), (1, 'gook'), (1, 'hook'), (1, 'kook'), (1, 'look'), (1, 'nook'), (1, 'rook'), (1, 'took')]
    
    # calling ``find()`` which is not recursive gives the same results
    >>> tree.find('book', 1)
    [(0, 'book'), (1, 'Cook'), (1, 'boo'), (1, 'boob'), (1, 'books'), (1, 'boom'), (1, 'boon'), (1, 'boor'), (1, 'boos'), (1, 'boot'), (1, 'brook'), (1, 'cook'), (1, 'gook'), (1, 'hook'), (1, 'kook'), (1, 'look'), (1, 'nook'), (1, 'rook'), (1, 'took')]

Creating the BkTree data structure using big amounts of data take some time so you can save it into a file and load it after.

    >>> tree = BKTree(levenshtein, dict_words())    
    >>> tree.safe_to_file('tree.json')  # Safe the tree structure in a file, default 'tree.json' 
    
    # Load the tree
    >>> tree1 = BKTree(levenshtein)
    >>> tree1.load('tree.json')
    >>> tree1.find('book', 1)
    [(0, 'book'), (1, 'Cook'), (1, 'boo'), (1, 'boob'), (1, 'books'), (1, 'boom'), (1, 'boon'), (1, 'boor'), (1, 'boos'), (1, 'boot'), (1, 'brook'), (1, 'cook'), (1, 'gook'), (1, 'hook'), (1, 'kook'), (1, 'look'), (1, 'nook'), (1, 'rook'), (1, 'took')]

Measure BKTree time vs brute force

    # Load the saved structure
    >>> tree = BKTree(levenshtein)
    >>> tree.load('tree.json')    
    >>> time_of(brute_query, 'book', list(dict_words()), levenshtein, 1)    
    Time:  5.613326072692871
    >>> time_of(tree.query, 'book', 1)  
    Time:  0.056838035583496094

TODO: Add python2 support.

About

Python3 Implementation of Burkhard-Keller tree.

License:MIT License


Languages

Language:Python 100.0%