Vidhi1290 / Essay-Quality-Prediction-Keystroke-Analysis-with-RandomForest

Explore the Essay Quality Prediction projectβ€”a machine learning model that predicts essay quality based on typing behaviors. Leveraging a Random Forest Regressor, this tool provides insights into writing processes. Connect with me on LinkedIn and find more projects on GitHub. Happy coding! πŸ“βœ¨

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Essay Quality Prediction with Keystroke Analysis πŸ“πŸ’»

Welcome to the Essay Quality Prediction project, where we delve into the fascinating realm of typing behavior to predict the quality of essays! πŸš€

Introduction

This repository houses a machine learning model that harnesses the power of keystroke analysis to predict the quality of essays. By extracting insights from users' typing behaviors during the writing process, our model aims to provide a unique perspective on the art of essay composition.

Dataset πŸ“Š

The dataset comprises approximately 5000 logs of user inputs, including keystrokes and mouse clicks, recorded during the creation of essays. Each essay is scored on a scale of 0 to 6. The challenge is to predict the score an essay received based on its log of user inputs.

Features 🧐

Our model leverages a variety of typing behavior features, including:

  • Total number of activities
  • Total and average action time
  • Maximum word count
  • Number of unique text changes
  • Average cursor position

Model Architecture πŸ€–

The model is constructed using a Random Forest Regressor with 100 estimators. We chose this ensemble learning technique for its robustness and ability to handle complex relationships within the data.

Acknowledgments πŸ™Œ

We extend our gratitude to Vanderbilt University, the competition host, and The Learning Agency Lab, the independent nonprofit based in Arizona. Their support has been instrumental in fostering cross-disciplinary research with global impact.

Usage πŸš€

  1. Data Preprocessing and Feature Engineering: Load the training data, merge logs and scores, and engineer typing behavior features.
  2. Model Training: Utilize a Random Forest Regressor to train the model on the prepared dataset.
  3. Evaluation: Assess the model's performance using Mean Squared Error on the validation set.
  4. Prediction: Deploy the trained model to make predictions on the test set and create a submission file.

Visualizations πŸ“ˆ

Explore the relationships between typing behavior features and essay scores through captivating visualizations:

  • Scatter plots depicting the influence of activities, action time, word count, text changes, and cursor position on essay scores.
  • A residual plot offering insights into prediction errors.
  • A feature importance plot showcasing the significance of different features.
  • A histogram illustrating the distribution of predicted scores.

Dependencies πŸ› οΈ

  • pandas
  • numpy
  • matplotlib
  • seaborn
  • scikit-learn
  • xgboost

Results πŸ“Š

  • Mean Squared Error on Validation Set: 0.42

Connect with Me 🌐

Let's connect and collaborate! Feel free to reach out to me on:

I'm always open to discussions, collaborations, and learning new things together. Don't hesitate to drop me a message or explore my other projects on GitHub. Happy coding! πŸš€

Feel free to dive into the code, experiment with the features, and explore the nuances of writing quality prediction through keystroke analysis! πŸ•΅οΈβ€β™‚οΈπŸ’¬

Happy coding! πŸš€

About

Explore the Essay Quality Prediction projectβ€”a machine learning model that predicts essay quality based on typing behaviors. Leveraging a Random Forest Regressor, this tool provides insights into writing processes. Connect with me on LinkedIn and find more projects on GitHub. Happy coding! πŸ“βœ¨


Languages

Language:Jupyter Notebook 100.0%