Brunopaes / modoc

A Machine Learning CV classificator

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MODOC - Mental Organism Designed Only for Classifying

This project aims to classify, based on natural language processing, the curriculum vitae from candidates for job vacancies. This project was, initially, built to the project management II discipline, nonetheless, ended in some kind of commercial product for HR consulting companies. This project is optimised for python 3.6.


Project Structure - Directories

  • Data: datasets directory;
  • Drivers: webdrivers and webcrawlers;
  • Scripts: python scripts directory.

Modules

  • scraper: The webscraping module (module responsible for extracting the CVs from web);
  • pdf_converter: The pdf-to-image module (due the OCR incapability to extract from pdf extensions, it is necessary to convert them into image files).
  • ocr: The image-to-text module (an machine learning model for image-to-text extraction);
  • classifier: The test classifier module (just an experimental module) to be substituted in future;
  • val_alg: The machine learning's fitting module to be implemented in future;
  • main: The machine learning's classifiers module to be implemented in future.

obs: due the low number of CVs. The presentation to investidors was maded using dividends receipts from Argentina Stock Exchange (Bolsar).


Requirements

This project, as dependencies, require the following python libraries:

  • scikit-learn;
  • pandas;

To install them, in your anaconda envoironment or virtual envoironment, run the following command:

  pip install sklearn pandas

Results

Models Accuracy

  1. The Random Forest model assertiveness rate was: 83.33 %.
  2. The dumb algorithm assertiveness rate was 50.00 %. - _independent of attributes, the model always infers Finalised.

Confusion Matrix

Finalised Not Finalised
Finalised 5 0
Not Finalised 1 0

About

A Machine Learning CV classificator


Languages

Language:Python 100.0%