tdm-teeft is a tdm module for terme exctraction of unstructured text. It can be used to get keywords of document.
Using npm :
$ npm i -g tdm-teeft
$ npm i --save tdm-teeft
Using Node :
/* require of Teeft module */
const Teeft = require('tdm-teeft');
/* Build new Instance of Tagger */
let tagger = new Teeft.Tagger();
/* Build new Instance of Filter */
let filter = new Teeft.Filter();
/* Build new Instance of Indexator */
let indexator = new Teeft.Indexator();
/* Build new Instance of TermExtraction */
let termextraction = new Teeft.TermExtraction();
$ npm run test
$ npm run docs
Kind: global class
- Filter
- new Filter([options])
- .call(occur, strength) ⇒
Boolean
- .configure(length) ⇒
Number
Returns: Filter
- - An instance of Filter
Param | Type | Description |
---|---|---|
[options] | Object |
Options of constructor |
[options.minOccur] | Number |
Number of minimal occurence |
[options.noLimitStrength] | Number |
Strength limit |
[options.lengthSteps] | Number |
Steps length |
Example (Example usage of 'contructor' (with paramters))
let options = {
// Will allow to assign a 'value' depending on the length of indexed text (nb of tokens)
'lengthSteps': {
'values': [ // store intermediate steps here,
{ // here : value '4' will be used for text length > 1000 tokens && text length <= 3000 tokens
'lim': 3000, // 'this property must be > 'lengthSteps.min.lim' && must be < 'lengthSteps.max.lim'
'value': 4
},
{ // here : value '5' will be used for text length > 3000 tokens && text length <= 4000 tokens
'lim': 4000, // 'this property must be > 'lengthSteps.min.lim' && must be < 'lengthSteps.max.lim'
'value': 5
}
],
'min': { // 'value' depending of minimum 'lim' length of text (here : value '1' will be used for text length <= 1000 tokens)
'lim': 1000,
'value': 1
},
'max': { // 'value' depending of maximum 'lim' length of text (here : value '7' will be used for text length > 6000 tokens)
'lim': 6000,
'value': 7
}
},
'minOccur': 3, // Minimal number of occurence (of tokens) used by default : here 3. This value will be updated depending on the length of indexed text when 'configure' function is called
'noLimitStrength': 2 //
},
defaultFilter = new Filter(options);
// returns an instance of Filter with properties :
// - minOccur : 3
// - noLimitStrength : 2
// - lengthSteps : {'values': [{'lim': 3000, 'value': 4}, {'lim': 4000, 'value': 5}], 'min': {'lim': 1000, 'value': 1}, 'max': {'lim': 6000, 'value': 7}
Example (Example usage of 'contructor' (with default values))
let defaultFilter = new Filter();
// returns an instance of Filter with properties :
// - minOccur : 7
// - noLimitStrength : 2
// - lengthSteps : {'values': [{'lim': 3000, 'value': 4}], 'min': {'lim': 1000, 'value': 1}, 'max': {'lim': 6000, 'value': 7}
Check values depending of filter conditions
Kind: instance method of Filter
Returns: Boolean
- Return true if conditions are respected
Param | Type | Description |
---|---|---|
occur | Number |
Occurence value |
strength | Number |
Strength value |
Example (Example usage of 'call' function)
let defaultFilter = new Filter();
defaultFilter.configure(500);
defaultFilter.call(1, 1); // returns true
defaultFilter.configure(5000);
defaultFilter.call(1, 1); // returns false
Configure the filter depending of lengthSteps
Kind: instance method of Filter
Returns: Number
- Return configured minOccur value
Param | Type | Description |
---|---|---|
length | Number |
Text length |
Example (Example usage of 'configure' function)
let defaultFilter = new Filter();
defaultFilter.configure(500); // returns 1
defaultFilter.configure(5000); // returns 7
defaultFilter.configure('test'); // returns null
Kind: global class
- Indexator
- new Indexator([options])
- instance
- .tokenize(text) ⇒
Array
- .translateTag(tag) ⇒
String
- .sanitize(terms) ⇒
Array
- .lemmatize(terms) ⇒
Array
- .index(data) ⇒
Object
- .tokenize(text) ⇒
- static
- .compare(a, b) ⇒
Number
- .compare(a, b) ⇒
Returns: Indexator
- - An instance of Indexator
Param | Type | Description |
---|---|---|
[options] | Object |
Options of constructor |
[options.filter] | Filter |
Options given to extractor of this instance of Indexator |
[options.lexicon] | Object |
Lexicon used by tagger of this instance of Indexator |
[options.stopwords] | Object |
Stopwords used by this instance of Indexator |
[options.lemmatizer] | Object |
Lemmatizer used by tagger of this instance of Indexator |
[options.stemmer] | Object |
Stemmer used by this instance of Indexator |
[options.dictionary] | Object |
Dictionnary used by this instance of Indexator |
Example (Example usage of 'contructor' (with paramters))
let options = {
'filter': customFilter // According customFilter contain your custom settings
},
indexator = new Indexator(options);
// returns an instance of Indexator with custom Filter
Example (Example usage of 'contructor' (with default values))
let indexator = new Indexator();
// returns an instance of Indexator with default options
Extract token from a text
Kind: instance method of Indexator
Returns: Array
- Array of tokens
Param | Type | Description |
---|---|---|
text | String |
Fulltext |
Example (Example usage of 'tokenize' function)
let indexator = new Indexator();
indexator.tokenize('my sample sentence'); // return ['my', 'sample', 'sentence']
Translate the tag of Tagger to Lemmatizer
Kind: instance method of Indexator
Returns: String
- Tag who match with a Lemmatizer tag (or false)
Param | Type | Description |
---|---|---|
tag | String |
Tag given by Tagger |
Example (Example usage of 'translateTag' function)
let indexator = new Indexator();
indexator.translateTag(RB); // return 'adv';
indexator.translateTag(JJ); // return 'adj';
indexator.translateTag(NN); // return 'noun';
indexator.translateTag(NNP); // return 'noun';
indexator.translateTag(VBG); // return 'verb';
indexator.translateTag(VBN); // return 'verb';
Sanitize list of terms (with some filter)
Kind: instance method of Indexator
Returns: Array
- Liste of sanitized terms
Param | Type | Description |
---|---|---|
terms | Array |
List of terms |
Example (Example usage of 'sanitize' function)
let indexator = new Indexator();
indexator.sanitize([ { term: 'this', tag: 'DT', lemma: 'this', stem: 'this' },
{ term: 'is', tag: 'VBZ' },
{ term: 'a', tag: 'DT' },
{ term: 'sample', tag: 'NN', lemma: 'sample', stem: 'sampl' },
{ term: 'test', tag: 'NN', lemma: 'test', stem: 'test' } ]);
// return [ { term: 'this', tag: 'DT', lemma: 'this', stem: 'this' },
// { term: '#', tag: '#' },
// { term: '#', tag: '#' },
// { term: 'sample', tag: 'NN', lemma: 'sample', stem: 'sampl' },
// { term: 'test', tag: 'NN', lemma: 'test', stem: 'test' } ]
Lemmatize a list of tagged terms (add a property lemma & stem)
Kind: instance method of Indexator
Returns: Array
- List of tagged terms with a lemma
Param | Type | Description |
---|---|---|
terms | Array |
List of tagged terms |
Example (Example usage of 'translateTag' function)
let indexator = new Indexator();
indexator.lemmatize([ { term: 'this', tag: 'DT', lemma: 'this', stem: 'this' },
{ term: 'is', tag: 'VBZ' },
{ term: 'a', tag: 'DT' },
{ term: 'sample', tag: 'NN', lemma: 'sample', stem: 'sampl' },
{ term: 'test', tag: 'NN', lemma: 'test', stem: 'test' } ]);
// return [ { term: 'this', tag: 'DT', lemma: 'this', stem: 'this' },
// { term: '#', tag: '#' },
// { term: '#', tag: '#' },
// { term: 'sample', tag: 'NN', lemma: 'sample', stem: 'sampl' },
// { term: 'test', tag: 'NN', lemma: 'test', stem: 'test' } ]
Index a fulltext
Kind: instance method of Indexator
Returns: Object
- Return a representation of fulltext (indexation & more informations/statistics about tokens/terms)
Param | Type | Description |
---|---|---|
data | String |
Fulltext who need to be indexed |
Example (Example usage of 'translateTag' function)
let indexator = new Indexator();
indexator.index('This is a sample sentence'); // return an object representation of indexation
Compare the specificity of two objects between them
Kind: static method of Indexator
Returns: Number
- -1, 1, or 0
Param | Type | Description |
---|---|---|
a | Object |
First object |
b | Object |
Second object |
Example (Example usage of 'compare' function)
Indexator.compare({ 'term': 'a', 'specificity': 1 }, { 'term': 'b', 'specificity': 2 }); // return 1
Indexator.compare({ 'term': 'a', 'specificity': 1 }, { 'term': 'b', 'specificity': 1 }); // return 0
Indexator.compare({ 'term': 'a', 'specificity': 2 }, { 'term': 'b', 'specificity': 1 }); // return -1
Kind: global class
- Tagger
- new Tagger([options])
- .tag(terms) ⇒
Array
Returns: Tagger
- - An instance of Tagger
Param | Type | Description |
---|---|---|
[options] | Object |
Options of constructor |
Example (Example usage of 'contructor' (with paramters))
let lexicon = { ... },
tagger = new Tagger(options);
// returns an instance of Tagger with custom lexion
Example (Example usage of 'contructor' (with default values))
let tagger = new Tagger();
// returns an instance of Tagger with default lexion
Tag terms
Kind: instance method of Tagger
Returns: Array
- List of tagged terms
Param | Type | Description |
---|---|---|
terms | Array |
List of terms |
Example (Example usage of 'tag' function)
let tagger = new Tagger();
tagger.tag(['this', 'is', 'a', 'test']); // return [{ 'term': 'this', 'tag': 'DT' }, { 'term': 'is', 'tag': 'VBZ' }, { 'term': 'a', 'tag': 'DT' }, { 'term': 'test', 'tag': 'NN' }]
Kind: global class
- TermExtractor
- new TermExtractor([options])
- .extract(taggedTerms) ⇒
Object
- ._startsWith(str, prefix) ⇒
Boolean
Returns: TermExtractor
- - An instance of TermExtractor
Param | Type | Description |
---|---|---|
[options] | Object |
Options of constructor |
[options.tagger] | Tagger |
An instance of Tagger |
[options.filter] | Filter |
An instance of Filter |
Example (Example usage of 'contructor' (with paramters))
let myTagger = new Tagger(), // According myTagger contain your custom settings
myFilter = new Filter(), // According myFilter contain your custom settings
termExtractor = new TermExtractor({ 'tagger': myTagger, 'filter': myFilter });
// returns an instance of TermExtractor with custom options
Example (Example usage of 'contructor' (with default values))
let termExtractor = new TermExtractor();
// returns an instance of TermExtractor with default options
Extract temrs
Kind: instance method of TermExtractor
Returns: Object
- Return all extracted terms
Param | Type | Description |
---|---|---|
taggedTerms | Array |
List of tagged terms |
Example (Example usage of 'extract' function)
let termExtractor = new TermExtractor(),
myDefaultTagger = new Tagger(),
taggedTerms = myDefaultTagger.tag('This is a sample test for this module. It index any fulltext. It is a sample test.');
termExtractor.extract(taggedTerms);
// return
// { 'sample': { 'frequency': 2, 'strength': 1 }, 'test': { 'frequency': 2, 'strength': 1 },
// 'sample test': { 'frequency': 2, 'strength': 2 },
// 'module': { 'frequency': 1, 'strength': 1 },
// 'index': { 'frequency': 1, 'strength': 1 },
// 'fulltext': { 'frequency': 1, 'strength': 1 }
// };
Check if prefix of given string match with given prefix
Kind: instance method of TermExtractor
Returns: Boolean
- Return true if the prefix of the string is correct, else false
Param | Type | Description |
---|---|---|
str | String |
String where the prefix will be searched |
prefix | String |
Prefix used for the research |