cptanalatriste / copycat-detector

A Naive-Bayes classifier for detecting plagiarism.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

copycat-detector

love_island_new_roster

A Naive-Bayes classifier for detecting plagiarism, trained over a dataset of short answers developed by Clough and Stevenson.

Getting started

To train the classifier, be sure to do the following first:

  1. Clone this repository.
  2. Download a modified version of the dataset.
  3. Place the dataset files in your cloned copy of the repository.
  4. Make sure you have installed all the Python packages defined in requirements.txt.

Instructions

The feature engineering steps are defined in the 2_Plagiarism_Feature_Engineering.ipynb jupyter notebook. Most of the code is contained in the copycat_detector module.

For training, notebook 3_Training_a_Model.ipynb was run on an Amazon SageMaker instance.

About

A Naive-Bayes classifier for detecting plagiarism.


Languages

Language:Jupyter Notebook 85.8%Language:Python 14.2%