BabakBar / NLP-Base

Practice NLP and relevant libraries, this time in Python. Regular Expressions, Tokenization, Topic Identification, NER, Classifiers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NLP-Base

Practice NLP and relevant libraries, this time in Python.

  • Regular expressions & word tokenization: basic NLP concepts, such as word tokenization and regular expressions to help parse text. Also how to handle non-English text and more difficult tokenization we might find.
  • Topic identification: Identify topics from texts based on term frequencies. We do experiment and compare two simple methods: bag-of-words and Tf-idf using NLTK, and a new library Gensim.
  • Named-entity recognition: Identify the who, what, and where of our texts using pre-trained models on English and non-English text. Also how to use polyglot and spaCy, to add to NLP toolbox.
  • Fake News Classifier: With basics along with supervised ML we build a "fake news" detector.

About

Practice NLP and relevant libraries, this time in Python. Regular Expressions, Tokenization, Topic Identification, NER, Classifiers


Languages

Language:Jupyter Notebook 99.9%Language:Python 0.1%