- The sys module is always available and provides access to variables and functions that interact with the interpreter.
- The re module provides operations for regular expression matching, useful for pattern and string search.
- pandas is one of the most widely used open-source tools for data manipulation and analysis. Developed in 2008, pandas provides an incredibly fast and efficient object with integrated indexing, called DataFrame. It comes with tools for reading and writing data from and to files and SQL databases. It can manipulate, reshape, filter, aggregate, merge, join and pivot large datasets and is highly optimized for performance.
- matplotlib is an easy-to-use, popular and comprehensive library in Python for creating visualizations. It supports basic plots (like line, bar, scatter, etc.), plots of arrays & fields, statistical plots (like histogram, boxplot, violin, etc.), and plots with unstructured coordinates.
- The Natural Language Toolkit, commonly known as NLTK, is a comprehensive open-source platform for building applications to process human language data. It comes with powerful text processing libraries for typical Natural Language Processing (NLP) tasks like cleaning, parsing, stemming, tagging, tokenization, classification, semantic reasoning, etc. NLTK has user-friendly interfaces to several popular corpora and lexical resources Word2Vec, WordNet, VADER Sentiment Lexicon, etc.
- This project uses the VADER lexicon with NLTK’s
SentimentIntensityAnalyzer
class to assign a sentiment score to each comment in the demo dataset. Valence Aware Dictionary and Sentiment Reasoner (VADER) is a lexicon and rule-based sentiment analysis toolset with a focus on sentiments contained in general text applications like online comments, social media posts, and survey responses. Please follow this link to learn more VADER and SentimentIntensityAnalyzer modules of NLTK.