rupeshghimire7 / Text-Processing-Modeling

NLP related tasks including text scraping, preprocessing, topic modeling & NER.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Text-Processing-Modeling

Welcome to the Text-Processing-Modeling repository! This series is designed to guide you through various aspects of Natural Language Processing (NLP), from text extraction and cleaning to advanced tasks like Named Entity Recognition (NER) and Topic Modeling. Whether you're a beginner or an experienced practitioner, you'll find valuable insights and practical examples in each directory.

Directory Structure:

1. Extract_Clean:

This directory is dedicated to the initial steps of text processing. You'll find scripts and tools for text scraping, extraction, and cleaning. The goal is to transform raw text data into a structured format, including the creation of a Document-Term Matrix (DTM) for further analysis.

2. Explore:

Explore is where you dive into the world of text data through exploration and visualization. Uncover patterns, trends, and anomalies in your text data using various visualization techniques. This section provides a solid foundation for understanding the characteristics of your corpus.

3. NER:

Named Entity Recognition (NER) is a fundamental task in NLP. In this directory, you'll find scripts leveraging spaCy to perform basic NER on your text data. The result is a DataFrame containing named entities per document, providing valuable insights into the entities mentioned in your text.

4. Topic Modeling:

Topic Modeling is a powerful technique to extract topics from a collection of documents. The scripts in this directory use Gensim's Latent Dirichlet Allocation (LDA) to identify and analyze topics within your text corpus. Gain a deeper understanding of the themes present in your data.

5. Resume_Parser:

This directory focuses on a specific and practical application of NLP - resume parsing. Learn how to train a custom NER model to extract key information from resumes. The Resume Parser included categorizes data into labels such as skills, courses, experience, tenure, organization, education, involvements, socials, and other default NER labels.

Happy NLP modeling! 🚀

About

NLP related tasks including text scraping, preprocessing, topic modeling & NER.


Languages

Language:Jupyter Notebook 100.0%