Emotion Analyzer is a Python-based tool designed to identify emotional tones in text. Utilizing a lexicon-based approach, it incorporates booster words and negations to offer a nuanced analysis of emotional content. The tool supports various file formats such as CSV, JSON, JSONL, and plain text, and is optimized for performance through multi-threading. Its capabilities can be further enhanced when used in conjunction with an emotion classifier.
- Lexicon-based emotion analysis
- Customizable lexicon and encoding
- Booster and negation word handling
- Ambiguity threshold for detecting mixed emotions
- Supports multiple input formats: CSV, JSON, JSONL, TXT
- Multi-threading support
- Export results to CSV, JSON, JSONL, TXT
The lexicon used in this project originates from the NRC-Emotion-Intensity-Lexicon-v1. It has undergone extensive curation and deduplication, and has been augmented with the lemminflect library to align with six fundamental emotions: "sadness," "love," "joy," "anger," "fear," and "surprise." The lexicon, which contains 9,762 entries, is functional but remains a work in progress.
By default, the program looks for a lexicon.csv
file in the same directory as the script. You can override this by specifying a different path using the --lexicon_path
command-line argument.
- Python 3.6+
Clone the repository:
git clone https://github.com/your-username/emotion-analyzer.git
Navigate into the project directory:
cd emotion-analyzer
The script can be run from the command line using argparse
.
python main.py --lexicon_path=path/to/lexicon.csv --input_file=path/to/input.txt --output=output_file_name --format=csv
--lexicon_path
: Path to the lexicon file. (Required)--max_threads
: Maximum number of threads for parallel processing. (Default: 4)--encoding
: File encoding for the lexicon (default is utf-8).--ambiguity_threshold
: Threshold for ambiguous emotions. (Default: 0.1)--input_file
: Input file to read texts from (CSV, TXT, JSON, JSONL supported).--json_key
: Key to read texts from when the input file is JSON or JSONL. (Default: 'text')--output
: Base name of the output file to save the results. (Default: 'output')--format
: Output format: csv, txt, json, jsonl. (Default: csv)--text
: Text to analyze if not reading from a file.
You can also use Emotion Analyzer in your Python project. Import the EmotionAnalyzer
and ResultsSaver
classes from the code and use as shown in the examples in the source code.
MIT License. See LICENSE for more details.
If you'd like to contribute, please fork the repository and use a feature branch. Pull requests are warmly welcome.