Polyglot is a natural language pipeline that supports massive multilingual applications.
- Free software: GPLv3 license
- Documentation: http://polyglot.readthedocs.org.
- Tokenization (165 Languages)
- Language detection (196 Languages)
- Named Entity Recognition (40 Languages)
- Part of Speech Tagging (16 Languages)
- Sentiment Analysis (136 Languages)
- Word Embeddings (137 Languages)
- Morphological analysis (135 Languages)
- Transliteration (69 Languages)
- Rami Al-Rfou @
rmyeid gmail com
Language Detected: Code=fr, Name=French
[u'Beautiful', u'is', u'better', u'than', u'ugly', u'.', u'Explicit', u'is', u'better', u'than', u'implicit', u'.', u'Simple', u'is', u'better', u'than', u'complex', u'.']
[Sentence("Beautiful is better than ugly."), Sentence("Explicit is better than implicit."), Sentence("Simple is better than complex.")]
O DET primeiro ADJ uso NOUN de ADP desobediência NOUN civil ADJ em ADP massa NOUN ocorreu ADJ em ADP setembro NOUN de ADP 1906 NUM . PUNCT
[I-LOC([u'Gro\xdfbritannien']), I-PER([u'Gandhi'])]
Bush Reagan Clinton Ahmadinejad Nixon Karzai McCain Biden Huckabee Lula
The first 10 dimensions out the 256 dimensions
- [-2.57382345 1.52175975 0.51070285 1.08678675 -0.74386948 -1.18616164
2.92784619 -0.25694436 -1.40958667 -2.39675403]
[u'Pre', u'process', u'ing']
препрокессинг