Abhijit89Kumar / -Data-Analytics-General-Championship-RK-Hall-

Repository from Github https://github.comAbhijit89Kumar/-Data-Analytics-General-Championship-RK-Hall-Repository from Github https://github.comAbhijit89Kumar/-Data-Analytics-General-Championship-RK-Hall-

Data Analytics General Championship - RK Hall

View Problem Statement
This repository showcases cutting-edge data analysis and machine learning techniques through a collection of Jupyter notebooks and resources. Explore clustering algorithms, fine-tune language models, and interact with a web application demo.

πŸš€ Key Features

  • Clustering Algorithms: Uncover hidden patterns in your data using K-Means, Gaussian Mixture Model (GMM), DBSCAN, and HDBSCAN.
  • Model Fine-Tuning: Elevate model performance by fine-tuning Mistral 7B, Gemma 2B, and BERT on specific tasks.
  • Web Application Demo: Experience the power of these models firsthand through an interactive web app.

πŸ“‚ Repository Structure

  • Notebook 1:
    • Clustering Algorithms: Dive into various clustering techniques for in-depth data analysis (requires Final_hopefully.csv and Unclustered_Demographic.csv).
    • Required Files: Essential data and the imputing_fxn.ipynb notebook for preprocessing.
    • For164 Zips-20240315T132840Z-001: Additional data resources.
  • Notebook 2:
    • Fine-Tuning: Enhance model performance on specific tasks using LLm_on_10k_train.csv and filtered_health_data.csv.
    • Required Files: Access the datasets needed for fine-tuning.
  • Web-app Screenshots: Visualize the web app experience through screenshots.
  • Demo: A video walkthrough of the web application.
  • Output: Explore sample input-output pairs from the model in a CSV file.

πŸš€ Getting Started

  1. Hugging Face: Get a read access token from your Hugging Face settings and store it as HF_READ_TOKEN in your environment.
  2. Model Weights: Download the fine-tuned models:
    • model_llama2 (7B llama2 chathf)
    • model_gemma (2B gemma instruct)
    • model_mistral (7B mistral instruct)
  3. Data: Ensure all required CSV files are in the correct directories as specified in the notebooks.

πŸ› οΈ System Requirements

  • Recommended: GPU with at least 50GB VRAM for smooth performance.
  • Alternative: If GPU VRAM is limited, load models onto CPU (slower processing).

⚠️ Note: This repository is designed for experienced users familiar with data analytics, machine learning, and model fine-tuning.

Feel Free to read the Project Report for a detailed explaination.

View Project Report

About


Languages

Language:Jupyter Notebook 100.0%