joaquimg / TextModel.jl

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TextModel.jl

Build Status Coverage Status codecov.io

TextModel.jl is a package to create vector representations of text, mostly, independently of the language. It is intended to be used with SimilaritySearch.jl, but can be used independetly if needed.

For generic text analysis you should use other packages like TextAnalysis.jl.

It supports a number of simple text preprocessing functions, and three different kinds of tokenizers, i.e., word n-grams, character q-grams, and skip-grams. It supports creating multisets of tokens, commonly named bag of words (BOW). TextModel.jl can produce sparse vector representations based on term-weighting schemes like TF, IDF, and TFIDF. It also supports term-weighting schemes designed to cope text classification tasks, mostly based on distributional representations.

About

License:Other


Languages

Language:Julia 100.0%