SmartMaatt / nlp-script-test

Repository contains scripts for analyzing and processing text data. It consists Python scripts, which use Natural Language Processing (NLP) to analyze articles. The project utilizes libraries such as NLTK, pandas, scikit-learn, and lxml for text data processing and analysis.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NLP script test

OverviewProject StructureRequirementsInstallationLicense

Overview

The nlp-script-test repository contains scripts for analyzing and processing text data. It consists of two main Python scripts (manage_classes.py and key_terms.py), which use Natural Language Processing (NLP) to analyze articles. The project utilizes libraries such as NLTK, pandas, scikit-learn, and lxml for text data processing and analysis.

Project Structure

  • manage_classes.py: Contains class definitions and methods for processing text data, including tokenization, lemmatization, filtering stopwords, and calculating most common words.
  • key_terms.py: The main script that runs the analysis of text data loaded from an XML file. Uses TfidfVectorizer to analyze the importance of words in documents.
  • news.xml: An XML file containing text data for analysis.

Requirements

The project requires Python version 3.8 or newer, along with the following libraries:

  • pandas
  • nltk
  • scikit-learn
  • lxml

Installation

  1. Create python virtual environment.
  2. To install the required dependencies, run the following command in the terminal:
pip install -r requirements.txt
  1. To run the script, execute the command:
python key_terms.py

License

This project is licensed under the MIT License - see the LICENSE file for details.


© 2023 Mateusz Płonka (SmartMatt). All rights reserved.

PortfolioGitHubLinkedInYouTubeTikTok

About

Repository contains scripts for analyzing and processing text data. It consists Python scripts, which use Natural Language Processing (NLP) to analyze articles. The project utilizes libraries such as NLTK, pandas, scikit-learn, and lxml for text data processing and analysis.

License:MIT License


Languages

Language:Python 100.0%