oMarquess / clustering-llm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Clustering System

Overview

This project leverages Large Language Models (LLMs) and other machine learning models to identify clusters in diverse datasets from uploaded CSV and XLSX files. The system clusters similar items within the data without assigning predefined categories to these clusters. FastAPI is used to handle API requests robustly, and LangChain is integrated to enhance data analysis with the capabilities of LLMs.

Requirements

  • Python 3.8+
  • FastAPI
  • Uvicorn (for serving the application)
  • LangChain
  • pandas (for file handling)
  • Unstructured [all-docs]
  • Any necessary machine learning libraries (e.g., TensorFlow, PyTorch)

Installation

# Clone the repository
git clone https://github.com/oMarquess/clustering-llm.git
cd src/clustering-llm

# Create a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

# Install dependencies
pip install -r requirements.txt

Usage

To run the application:

# Start the FastAPI server
uvicorn main:app --reload

This will serve the API on http://127.0.0.1:8000/docs.

API Endpoints

  • POST /upload-file: Accepts file uploads in CSV or XLSX format and returns clusters based on identified patterns.

File Uploads

Use the /upload-file endpoint to upload your data files. The endpoint expects files in either CSV or XLSX format. Ensure that your files are formatted correctly with headers that describe each column of data.

Developing with LangChain

LangChain is utilized to leverage LLMs for analyzing and clustering data based on the features extracted from the uploaded files. Ensure your LangChain configurations are set correctly to interface with your chosen LLM provider.

Deployment

For production deployment, use a production-ready server like Gunicorn, and ensure you have proper logging, monitoring, and security configurations in place.

Contributing

Contributions to this project are welcome. Please fork the repository and submit a pull request with your features or fixes.

License

Apache 2.0

About

License:Apache License 2.0


Languages

Language:Python 100.0%