tdm-teeft

tdm-teeft is a tdm module for terme exctraction of unstructured text. It can be used to get keywords of document.

Installation

Using npm :

$ npm i -g tdm-teeft
$ npm i --save tdm-teeft

Using Node :

/* require of Teeft module */
const Teeft = require('tdm-teeft');

/* Build new Instance of Tagger */
let tagger = new Teeft.Tagger();

/* Build new Instance of Filter */
let filter = new Teeft.Filter();

/* Build new Instance of Indexator */
let indexator = new Teeft.Indexator();

/* Build new Instance of TermExtraction */
let termextraction = new Teeft.TermExtraction();

Launch tests

$ npm run test

Build documentation

$ npm run docs

API Documentation

Classes

Filter
Indexator
Tagger
TermExtractor

Filter

Kind: global class

Filter
- new Filter([options])
- .call(occur, strength) ⇒ Boolean
- .configure(length) ⇒ Number

new Filter([options])

Returns: Filter - - An instance of Filter

Param	Type	Description
[options]	`Object`	Options of constructor
[options.minOccur]	`Number`	Number of minimal occurence
[options.noLimitStrength]	`Number`	Strength limit
[options.lengthSteps]	`Number`	Steps length

Example (Example usage of 'contructor' (with paramters))

let options = {
  // Will allow to assign a 'value' depending on the length of indexed text (nb of tokens)
  'lengthSteps': {
    'values': [ // store intermediate steps here,
      { // here : value '4' will be used for text length > 1000 tokens && text length <= 3000 tokens
        'lim': 3000, // 'this property must be > 'lengthSteps.min.lim' && must be < 'lengthSteps.max.lim'
        'value': 4
      },
      { // here : value '5' will be used for text length > 3000 tokens && text length <= 4000 tokens
        'lim': 4000, // 'this property must be > 'lengthSteps.min.lim' && must be < 'lengthSteps.max.lim'
        'value': 5
      }
    ],
    'min': { // 'value' depending of minimum 'lim' length of text (here : value '1' will be used for text length <= 1000 tokens)
      'lim': 1000,
      'value': 1
    },
    'max': { // 'value' depending of maximum 'lim' length of text (here : value '7' will be used for text length > 6000 tokens)
      'lim': 6000,
      'value': 7
    }
  },
  'minOccur': 3, // Minimal number of occurence (of tokens) used by default : here 3. This value will be updated depending on the length of indexed text when 'configure' function is called
  'noLimitStrength': 2 //
  },
  defaultFilter = new Filter(options);
// returns an instance of Filter with properties :
// - minOccur : 3
// - noLimitStrength : 2
// - lengthSteps : {'values': [{'lim': 3000, 'value': 4}, {'lim': 4000, 'value': 5}], 'min': {'lim': 1000, 'value': 1}, 'max': {'lim': 6000, 'value': 7}

Example (Example usage of 'contructor' (with default values))

let defaultFilter = new Filter();
// returns an instance of Filter with properties :
// - minOccur : 7
// - noLimitStrength : 2
// - lengthSteps : {'values': [{'lim': 3000, 'value': 4}], 'min': {'lim': 1000, 'value': 1}, 'max': {'lim': 6000, 'value': 7}

filter.call(occur, strength) ⇒ `Boolean`

Check values depending of filter conditions

Kind: instance method of Filter
Returns: Boolean - Return true if conditions are respected

Param	Type	Description
occur	`Number`	Occurence value
strength	`Number`	Strength value

Example (Example usage of 'call' function)

let defaultFilter = new Filter();
defaultFilter.configure(500);
defaultFilter.call(1, 1); // returns true
defaultFilter.configure(5000);
defaultFilter.call(1, 1); // returns false

filter.configure(length) ⇒ `Number`

Configure the filter depending of lengthSteps

Kind: instance method of Filter
Returns: Number - Return configured minOccur value

Param	Type	Description
length	`Number`	Text length

Example (Example usage of 'configure' function)

let defaultFilter = new Filter();
defaultFilter.configure(500); // returns 1
defaultFilter.configure(5000); // returns 7
defaultFilter.configure('test'); // returns null

Indexator

Kind: global class

Indexator
- new Indexator([options])
- instance
  - .tokenize(text) ⇒ Array
  - .translateTag(tag) ⇒ String
  - .sanitize(terms) ⇒ Array
  - .lemmatize(terms) ⇒ Array
  - .index(data) ⇒ Object
- static
  - .compare(a, b) ⇒ Number

new Indexator([options])

Returns: Indexator - - An instance of Indexator

Param	Type	Description
[options]	`Object`	Options of constructor
[options.filter]	`Filter`	Options given to extractor of this instance of Indexator
[options.lexicon]	`Object`	Lexicon used by tagger of this instance of Indexator
[options.stopwords]	`Object`	Stopwords used by this instance of Indexator
[options.lemmatizer]	`Object`	Lemmatizer used by tagger of this instance of Indexator
[options.stemmer]	`Object`	Stemmer used by this instance of Indexator
[options.dictionary]	`Object`	Dictionnary used by this instance of Indexator

Example (Example usage of 'contructor' (with paramters))

let options = {
    'filter': customFilter // According customFilter contain your custom settings
  },
  indexator = new Indexator(options);
// returns an instance of Indexator with custom Filter

Example (Example usage of 'contructor' (with default values))

let indexator = new Indexator();
// returns an instance of Indexator with default options

indexator.tokenize(text) ⇒ `Array`

Extract token from a text

Kind: instance method of Indexator
Returns: Array - Array of tokens

Param	Type	Description
text	`String`	Fulltext

Example (Example usage of 'tokenize' function)

let indexator = new Indexator();
indexator.tokenize('my sample sentence'); // return ['my', 'sample', 'sentence']

indexator.translateTag(tag) ⇒ `String`

Translate the tag of Tagger to Lemmatizer

Kind: instance method of Indexator
Returns: String - Tag who match with a Lemmatizer tag (or false)

Param	Type	Description
tag	`String`	Tag given by Tagger

Example (Example usage of 'translateTag' function)

let indexator = new Indexator();
indexator.translateTag(RB); // return 'adv';
indexator.translateTag(JJ); // return 'adj';
indexator.translateTag(NN); // return 'noun';
indexator.translateTag(NNP); // return 'noun';
indexator.translateTag(VBG); // return 'verb';
indexator.translateTag(VBN); // return 'verb';

indexator.sanitize(terms) ⇒ `Array`

Sanitize list of terms (with some filter)

Kind: instance method of Indexator
Returns: Array - Liste of sanitized terms

Param	Type	Description
terms	`Array`	List of terms

Example (Example usage of 'sanitize' function)

let indexator = new Indexator();
indexator.sanitize([ { term: 'this', tag: 'DT', lemma: 'this', stem: 'this' },
  { term: 'is', tag: 'VBZ' },
  { term: 'a', tag: 'DT' },
  { term: 'sample', tag: 'NN', lemma: 'sample', stem: 'sampl' },
  { term: 'test', tag: 'NN', lemma: 'test', stem: 'test' } ]);
// return [ { term: 'this', tag: 'DT', lemma: 'this', stem: 'this' },
//   { term: '#', tag: '#' },
//   { term: '#', tag: '#' },
//   { term: 'sample', tag: 'NN', lemma: 'sample', stem: 'sampl' },
//   { term: 'test', tag: 'NN', lemma: 'test', stem: 'test' } ]

indexator.lemmatize(terms) ⇒ `Array`

Lemmatize a list of tagged terms (add a property lemma & stem)

Kind: instance method of Indexator
Returns: Array - List of tagged terms with a lemma

Param	Type	Description
terms	`Array`	List of tagged terms

Example (Example usage of 'translateTag' function)

let indexator = new Indexator();
indexator.lemmatize([ { term: 'this', tag: 'DT', lemma: 'this', stem: 'this' },
  { term: 'is', tag: 'VBZ' },
  { term: 'a', tag: 'DT' },
  { term: 'sample', tag: 'NN', lemma: 'sample', stem: 'sampl' },
  { term: 'test', tag: 'NN', lemma: 'test', stem: 'test' } ]);
// return [ { term: 'this', tag: 'DT', lemma: 'this', stem: 'this' },
//   { term: '#', tag: '#' },
//   { term: '#', tag: '#' },
//   { term: 'sample', tag: 'NN', lemma: 'sample', stem: 'sampl' },
//   { term: 'test', tag: 'NN', lemma: 'test', stem: 'test' } ]

indexator.index(data) ⇒ `Object`

Index a fulltext

Kind: instance method of Indexator
Returns: Object - Return a representation of fulltext (indexation & more informations/statistics about tokens/terms)

Param	Type	Description
data	`String`	Fulltext who need to be indexed

Example (Example usage of 'translateTag' function)

let indexator = new Indexator();
indexator.index('This is a sample sentence'); // return an object representation of indexation

Indexator.compare(a, b) ⇒ `Number`

Compare the specificity of two objects between them

Kind: static method of Indexator
Returns: Number - -1, 1, or 0

Param	Type	Description
a	`Object`	First object
b	`Object`	Second object

Example (Example usage of 'compare' function)

Indexator.compare({ 'term': 'a', 'specificity': 1 }, { 'term': 'b', 'specificity': 2 }); // return 1
Indexator.compare({ 'term': 'a', 'specificity': 1 }, { 'term': 'b', 'specificity': 1 }); // return 0
Indexator.compare({ 'term': 'a', 'specificity': 2 }, { 'term': 'b', 'specificity': 1 }); // return -1

Tagger

Kind: global class

Tagger
- new Tagger([options])
- .tag(terms) ⇒ Array

new Tagger([options])

Returns: Tagger - - An instance of Tagger

Param	Type	Description
[options]	`Object`	Options of constructor

Example (Example usage of 'contructor' (with paramters))

let lexicon = { ... },
  tagger = new Tagger(options);
// returns an instance of Tagger with custom lexion

Example (Example usage of 'contructor' (with default values))

let tagger = new Tagger();
// returns an instance of Tagger with default lexion

tagger.tag(terms) ⇒ `Array`

Tag terms

Kind: instance method of Tagger
Returns: Array - List of tagged terms

Param	Type	Description
terms	`Array`	List of terms

Example (Example usage of 'tag' function)

let tagger = new Tagger();
tagger.tag(['this', 'is', 'a', 'test']); // return [{ 'term': 'this', 'tag': 'DT' }, { 'term': 'is', 'tag': 'VBZ' }, { 'term': 'a', 'tag': 'DT' }, { 'term': 'test', 'tag': 'NN' }]

TermExtractor

Kind: global class

TermExtractor
- new TermExtractor([options])
- .extract(taggedTerms) ⇒ Object
- ._startsWith(str, prefix) ⇒ Boolean

new TermExtractor([options])

Returns: TermExtractor - - An instance of TermExtractor

Param	Type	Description
[options]	`Object`	Options of constructor
[options.tagger]	`Tagger`	An instance of Tagger
[options.filter]	`Filter`	An instance of Filter

Example (Example usage of 'contructor' (with paramters))

let myTagger = new Tagger(), // According myTagger contain your custom settings
  myFilter = new Filter(), // According myFilter contain your custom settings
  termExtractor = new TermExtractor({ 'tagger': myTagger, 'filter': myFilter });
// returns an instance of TermExtractor with custom options

Example (Example usage of 'contructor' (with default values))

let termExtractor = new TermExtractor();
// returns an instance of TermExtractor with default options

termExtractor.extract(taggedTerms) ⇒ `Object`

Extract temrs

Kind: instance method of TermExtractor
Returns: Object - Return all extracted terms

Param	Type	Description
taggedTerms	`Array`	List of tagged terms

Example (Example usage of 'extract' function)

let termExtractor = new TermExtractor(),
  myDefaultTagger = new Tagger(),
  taggedTerms = myDefaultTagger.tag('This is a sample test for this module. It index any fulltext. It is a sample test.');
termExtractor.extract(taggedTerms);
// return
// { 'sample': { 'frequency': 2, 'strength': 1 }, 'test': { 'frequency': 2, 'strength': 1 },
// 'sample test': { 'frequency': 2, 'strength': 2 },
// 'module': { 'frequency': 1, 'strength': 1 },
// 'index': { 'frequency': 1, 'strength': 1 },
// 'fulltext': { 'frequency': 1, 'strength': 1 }
// };

termExtractor._startsWith(str, prefix) ⇒ `Boolean`

Check if prefix of given string match with given prefix

Kind: instance method of TermExtractor
Returns: Boolean - Return true if the prefix of the string is correct, else false

Param	Type	Description
str	`String`	String where the prefix will be searched
prefix	`String`	Prefix used for the research

btray77 / tdm-teeft

tdm-teeft

Installation

Launch tests

Build documentation

API Documentation

Classes

Filter

new Filter([options])

filter.call(occur, strength) ⇒ `Boolean`

filter.configure(length) ⇒ `Number`

Indexator

new Indexator([options])

indexator.tokenize(text) ⇒ `Array`

indexator.translateTag(tag) ⇒ `String`

indexator.sanitize(terms) ⇒ `Array`

indexator.lemmatize(terms) ⇒ `Array`

indexator.index(data) ⇒ `Object`

Indexator.compare(a, b) ⇒ `Number`

Tagger

new Tagger([options])

tagger.tag(terms) ⇒ `Array`

TermExtractor

new TermExtractor([options])

termExtractor.extract(taggedTerms) ⇒ `Object`

termExtractor._startsWith(str, prefix) ⇒ `Boolean`

About

Languages

tdm-teeft

Installation

Launch tests

Build documentation

API Documentation

Classes

Filter

new Filter([options])

filter.call(occur, strength) ⇒ Boolean

filter.configure(length) ⇒ Number

Indexator

new Indexator([options])

indexator.tokenize(text) ⇒ Array

indexator.translateTag(tag) ⇒ String

indexator.sanitize(terms) ⇒ Array

indexator.lemmatize(terms) ⇒ Array

indexator.index(data) ⇒ Object

Indexator.compare(a, b) ⇒ Number

Tagger

new Tagger([options])

tagger.tag(terms) ⇒ Array

TermExtractor

new TermExtractor([options])

termExtractor.extract(taggedTerms) ⇒ Object

termExtractor._startsWith(str, prefix) ⇒ Boolean

About

Languages

filter.call(occur, strength) ⇒ `Boolean`

filter.configure(length) ⇒ `Number`

indexator.tokenize(text) ⇒ `Array`

indexator.translateTag(tag) ⇒ `String`

indexator.sanitize(terms) ⇒ `Array`

indexator.lemmatize(terms) ⇒ `Array`

indexator.index(data) ⇒ `Object`

Indexator.compare(a, b) ⇒ `Number`

tagger.tag(terms) ⇒ `Array`

termExtractor.extract(taggedTerms) ⇒ `Object`

termExtractor._startsWith(str, prefix) ⇒ `Boolean`