Patrick Frey's repositories
strusPattern
Implements token pattern matching on documents with the tokens produced by regular expressions on text with the intel hyperscan library.
strusWebService
web service (HTTP/JSON) to use the strus API as a service
strusAnalyzer
Library for document analysis (segmentation, tokenization, normalization, aggregation) with the goal to get a set of items that can be inserted into a strus storage. Also some functions for analysing tokens or phrases of the strus query are provided.
strusBindings
Language bindings (Java,Python,PHP,etc.) for strus
strusUtilities
A set of command line programs to access the strus information retrieval engine
strusWikipediaSearch
Search engine for Wikipedia (strus demo project)
textwolf
textwolf is a C++ template library for processing XML in various character set encodings. It has interfaces for iterating on unicode characters, XML elements and elements typed by xml path selection expression matches. It supports chunk-wise processing of input and has a peformance competitive among the fastest open source XML processors.
CompactNodeTrie
Implements a trie with nodes compacted depending on the number of successor nodes
strusModule
Provides a module loader and an interface to define loadable modules out of libraries to offer some expandability to strus
strusTrace
Generation of method call traces as aspect of Strus. Create a proxy for each class that logs all method calls
strusVector
Provides a mapping of vectors to features for strus
strusDocker
docker images for strus
strusPrototypeModuleV1
An example weighting schema for the strus system implementing a simple query language
strusTutorials
Tutorials for the strus search engine
word2vec
This tool provides an efficient implementation of the continuous bag-of-words and skip-gram architectures for computing vector representations of words. These representations can be subsequently used in many natural language processing applications and for further research.