There are 2 repositories under tokenize topic.
CommonMark compliant markdown parser in Rust with ASTs and extensions
Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
snapdragon is an extremely pluggable, powerful and easy-to-use parser-renderer factory.
mdast utility to parse markdown
NLP Functions for amplifying negations, managing elisions, creating ngrams, stems, phonetic codes to tokens and more.
Tokenize2 is a plugin which allows your users to select multiple items from a predefined list or ajax, using autocompletion as they type to find each item. You may have seen a similar type of text entry when filling in the recipients field sending messages on facebook or tags on tumblr.
Examples scripts that showcase how to use Private AI Text to de-identify, redact, hash, tokenize, mask and synthesize PII in text.
Extract JavaScript code comments from a string or glob of files.
bKash payment gateway integration in flutter
Lexers, tokenizers, parsers, compilers, renderers, stringifiers... What's the difference, and how do they work?
Korean text data preprocess toolkit for NLP
Uses babel to extract JavaScript code comments from a string. Returns an array of comment objects, with line, column, index, comment type and comment string.
Uses snapdragon to tokenize a single JavaScript block comment into an object, with description, tags, and code example sections that can be passed to any other comment parsers for further parsing.
Implemented transformer NN block for Machine translation, text classfication, Natural language inference as well as Machine reading comprehension model.
More detailed documentation for the Python tokenize module
Transforms tokens into original source code (while preserving whitespace)
A Python library for interacting with TI-(e)z80 (82/83/84 series) calculator files
Easily scan a string with an object of regex patterns to produce an array of tokens. ~100 sloc.
Python3 module to tokenize english sentences.
Sentiment analysis for amazon product reviews using NLTK, Scikit-Learn, and Keras. Using hyperparameter search and LSTM, our best model achieves ~96% accuracy.
Splits a JSON string into an annotated list of tokens
Snapdragon utility for creating a stack.
Modular TypeScript template engine
A PHP Library to extract n-grams from a text. Simple preprocessing tools (cleaning, tokenizing) included.
Simple Go lexer: Lex own syntax and read it's from file.
SerialConfigCommand allows user to issue commands, with or without values via the Serial Monitor easily. Example: "LED=255", "Lock=1", "Start". Compatible with Arduino String() class and character array.
simple regex for correcting punctuations
Adds a location object to snapdragon token or AST node.
Tokenize a string into an array of string parts and format identifier objects.