valinsogna / EthicalAI-STACKAnalysis

A ML algorithm capable of conducting an in-depth analysis of students' responses to STACK questions

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

STACK Student Response Analysis for Sage Foundation Ethical AI Hackathon

A ML algorithm capable of conducting an in-depth analysis of students' responses to STACK questions for the Ethical AI Hackathon promoted by Sage Foundation.

STACK is the world-leading open-source online assessment system for mathematics and STEM. It is available for Moodle, ILIAS and as an integration through LTI.

The algorithms used were CA for discovery lexical similarity between students' incorrect answers and K-means to cluster them.

Table of Contents

  1. Team Introduction & Understanding the Problem
  2. Data Cleaning
  3. Python Scripts & Visualisation
  4. ML algorithms
  5. Presentation
  6. Future Improvements
  7. Team & Researchers

1. Team Introduction & Understanding the Problem

Review of the sample data

Hackathon Challenge

Our challenge in this hackathon is to develop a machine learning algorithm to analyze students' responses to STACK questions. The aim is to classify correct vs. incorrect responses, further delve into the types of incorrect responses, group similar incorrect responses, and identify any outlier responses.

Aim

To devise an algorithm that effectively provides an in-depth analysis of students' answers to STACK questions.

Specific Objectives

  1. Classification of Correct vs. Incorrect Responses
  2. Multilevel Classification of Incorrect Responses (Predicted vs. unpredicted responses using PRT paths)
  3. Cluster Analysis - Grouping Similar incorrect responses
  4. Anomaly Detection Based on Question Text

2.Data Cleaning

For the purposes of our analysis, only the finished attempts are considered.

3. Python Scripts & Visualisation

Each objective was approached with a dedicated Python script, followed by visualization to represent the analysis results.

Writing Python Scripts

  1. Script for Objective 1-2: Link to the Code
  2. Script for Objective 3-4: Link to the Code

4. Machine Learning Analysis Summary

  • Contingency tables: for each type of question, a contingency table of students'answer was build using as vocabulary the characters present in each response.
  • Correspondence Analysis (CA): for each type of question, 2D CA was performed on predicted and not predicted wrong students'answers in order to analyzes lexical (dis)similarities between them.
  • K-means: used for clustering to understand common errors for each type of question, using as input the results from each CA.
  • Data Saving and Retrieval: save analyzed data for future use or further analysis.

5. Presentation

The findings, algorithm, and insights were compiled and documented for presentation to the Hackathon judges.

6. Future Improvements

  • After individual testing, all code blocks should be integrated into a single program.
  • Increase num of dimensions for Correspondence Analysis (3D).
  • Add mathematical functions and symbol to the vocabulary for creating contingency table.
  • Choose effective num of clusters based on the better view of data from 3D CA.
  • Create API to fetch this clustering data and work as an input to the STACK system.

7. Team & Researchers

Below are the contributors to this project:

Team Members

Lead Researchers

References

About

A ML algorithm capable of conducting an in-depth analysis of students' responses to STACK questions

License:MIT License


Languages

Language:Jupyter Notebook 100.0%