classification corpus count-vectoriation count-vectorizer lemmitization ner nltk part-of-speech-tagging regular-expression resume-classification spacy stemming text-mining textract tf-idf-vectorizer word-cloud word-clouds

Project-Resume-Classification

Problem Statement & Business Objectives: The document classification solution should significantly reduce the manual human effort in the HRM. It should achieve a higher level of accuracy and automation with minimal human intervention.

Abstract:

A resume is a brief summary of your skills and experience. Companies recruiters and HR teams have a tough time scanning thousands of qualified resumes. Spending too many labor hours segregating candidates resume's manually is a waste of a company's time, money, and productivity. Recruiters, therefore, use resume classification in order to streamline the resume and applicant screening process. NLP technology allows recruiters to electronically gather, store, and organize large quantities of resumes. Once acquired, the resume data can be easily searched through and analyzed.

Resumes are an ideal example of unstructured data. Since there is no widely accepted resume layout, each resume may have its own style of formatting, different text blocks and different category titles. Building a resume classification and gathering text from it is no easy task as there are so many kinds of layouts of resumes that you could imagine.

🔹The basic data analysis process performed such as data collection, text mining, data cleaning, exploratory data analysis, data visualization.

🔹Building a Machine learning model for Resume Classification using Python and basic Natural language processing techniques.

🔹Used Python's libraries to implement various NLP techniques like tokenization, lemmatization, parts of speech tagging, etc.

🔹A resume classification analyzes resume data and extracts the information into the machine-readable output. It helps automatically store, organize, and analyze the resume data to find out the candidate for the particular job position and requirements.

🔹The aim of this project is achieved by performing the various data analysis methods and using the Machine Learning models and Natural Language Processing which will help in classifying the categories of the resume and building the Resume Classification Model.

In this work, I compare different types of machine-learning algorithms.

K-Nearest Neighbors
Decision Tree
Random Forest
Support Vector Machine
Logistic Regression
Bagging Classifier
Ada Boost Classifier
Gradient Boosting
Naive Bayes

All models above show the following train and test accuracy:

✍️ Author

Shanu Halli

Connect with me

About

The document classification solution should significantly reduce the manual human effort in the HRM. It should achieve a higher level of accuracy and automation with minimal human intervention.

classification corpus count-vectoriation count-vectorizer lemmitization ner nltk part-of-speech-tagging regular-expression resume-classification spacy stemming text-mining textract tf-idf-vectorizer word-cloud word-clouds

Languages

Language:Jupyter Notebook 99.9%Language:Python 0.1%