Data Analytics General Championship - RK Hall

View Problem Statement
This repository showcases cutting-edge data analysis and machine learning techniques through a collection of Jupyter notebooks and resources. Explore clustering algorithms, fine-tune language models, and interact with a web application demo.

🚀 Key Features

Clustering Algorithms: Uncover hidden patterns in your data using K-Means, Gaussian Mixture Model (GMM), DBSCAN, and HDBSCAN.
Model Fine-Tuning: Elevate model performance by fine-tuning Mistral 7B, Gemma 2B, and BERT on specific tasks.
Web Application Demo: Experience the power of these models firsthand through an interactive web app.

📂 Repository Structure

Notebook 1:
- Clustering Algorithms: Dive into various clustering techniques for in-depth data analysis (requires Final_hopefully.csv and Unclustered_Demographic.csv).
- Required Files: Essential data and the imputing_fxn.ipynb notebook for preprocessing.
- For164 Zips-20240315T132840Z-001: Additional data resources.
Notebook 2:
- Fine-Tuning: Enhance model performance on specific tasks using LLm_on_10k_train.csv and filtered_health_data.csv.
- Required Files: Access the datasets needed for fine-tuning.
Web-app Screenshots: Visualize the web app experience through screenshots.
Demo: A video walkthrough of the web application.
Output: Explore sample input-output pairs from the model in a CSV file.

🚀 Getting Started

Hugging Face: Get a read access token from your Hugging Face settings and store it as HF_READ_TOKEN in your environment.
Model Weights: Download the fine-tuned models:
- model_llama2 (7B llama2 chathf)
- model_gemma (2B gemma instruct)
- model_mistral (7B mistral instruct)
Data: Ensure all required CSV files are in the correct directories as specified in the notebooks.

🛠️ System Requirements

Recommended: GPU with at least 50GB VRAM for smooth performance.
Alternative: If GPU VRAM is limited, load models onto CPU (slower processing).

⚠️ Note: This repository is designed for experienced users familiar with data analytics, machine learning, and model fine-tuning.

Feel Free to read the Project Report for a detailed explaination.

View Project Report

Abhijit89Kumar / -Data-Analytics-General-Championship-RK-Hall-