View Problem Statement
This repository showcases cutting-edge data analysis and machine learning techniques through a collection of Jupyter notebooks and resources. Explore clustering algorithms, fine-tune language models, and interact with a web application demo.
- Clustering Algorithms: Uncover hidden patterns in your data using K-Means, Gaussian Mixture Model (GMM), DBSCAN, and HDBSCAN.
- Model Fine-Tuning: Elevate model performance by fine-tuning Mistral 7B, Gemma 2B, and BERT on specific tasks.
- Web Application Demo: Experience the power of these models firsthand through an interactive web app.
- Notebook 1:
- Clustering Algorithms: Dive into various clustering techniques for in-depth data analysis (requires
Final_hopefully.csv
andUnclustered_Demographic.csv
). - Required Files: Essential data and the
imputing_fxn.ipynb
notebook for preprocessing. - For164 Zips-20240315T132840Z-001: Additional data resources.
- Clustering Algorithms: Dive into various clustering techniques for in-depth data analysis (requires
- Notebook 2:
- Fine-Tuning: Enhance model performance on specific tasks using
LLm_on_10k_train.csv
andfiltered_health_data.csv
. - Required Files: Access the datasets needed for fine-tuning.
- Fine-Tuning: Enhance model performance on specific tasks using
- Web-app Screenshots: Visualize the web app experience through screenshots.
- Demo: A video walkthrough of the web application.
- Output: Explore sample input-output pairs from the model in a CSV file.
- Hugging Face: Get a read access token from your Hugging Face settings and store it as
HF_READ_TOKEN
in your environment. - Model Weights: Download the fine-tuned models:
model_llama2
(7B llama2 chathf)model_gemma
(2B gemma instruct)model_mistral
(7B mistral instruct)
- Data: Ensure all required CSV files are in the correct directories as specified in the notebooks.
- Recommended: GPU with at least 50GB VRAM for smooth performance.
- Alternative: If GPU VRAM is limited, load models onto CPU (slower processing).