yukio808 / text2token

break down a corpus of text into lines and tokens

Home Page:https://www.npmjs.com/package/text2token

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

travis build Codecov version license semantic-release

text2token

is a nodejs module that breaks down a corpus of text into lines and tokens.

Usage

The module has one method: text2token, which returns an object that contains a list of each line in your text file, as well as a list of all unique tokens.

$ node
> 
> var lib = require('text2token');

> var converted = lib.text2token('./src/bigtext.txt')

> converted.tokens
  [ '©',
  '2015',
  'GitHub,',
  'Inc.',
  'Terms',
  'Privacy',
  'Security',
  ..........

> converted.lines

[ '© 2015 GitHub, Inc. Terms Privacy Security Contact Help',
  'Status API Training Shop Blog About Pricing',
  'The quick brown fox jumped over the lazy dog'
 .......

MIT License 2015 © Andy Craze

About

break down a corpus of text into lines and tokens

https://www.npmjs.com/package/text2token


Languages

Language:JavaScript 100.0%