dsabarinathan / clean-ui2

Simple UI for Llama-3.2-11B-Vision & Molmo-7B-D

Repository from Github https://github.comdsabarinathan/clean-ui2Repository from Github https://github.comdsabarinathan/clean-ui2

Clean-UI for Multi-Modal Vision Models

This project offers a user-friendly interface for working with the Llama-3.2-11B-Vision and Molmo-7B-D models.

In this case, both the Llama-3.2-11B-Vision-bnb-4bit and Molmo-7B-D-bnb-4bit models need 12GB of VRAM to run.

The model selection is done via the command line:

Installation

To set up and run this project on your local machine, follow the steps below:

1. Clone the Repository

Copy the repository to a convenient location on your computer:

git clone <repository-url>
cd <repository-directory>

2. Create a Virtual Environment

Inside the cloned repository, create a virtual environment using the following command:

python -m venv venv-ui

3. Activate the Virtual Environment

Activate the virtual environment using:

.\venv-ui\Scripts\activate

4. Install Dependencies

After activating the virtual environment, install the necessary dependencies from requirements.txt:

pip install -r requirements.txt

Install Torch and TorchVision using separate commands:

pip install torch==2.4.1+cu121 --index-url https://download.pytorch.org/whl/cu121

and

pip install torchvision==0.19.1+cu121 --index-url https://download.pytorch.org/whl/cu121

Usage

To start the UI, you can either:

  • Use the run.bat script (Windows only)

    Simply double-click on run.bat or

    1. Activate the virtual environment:

      • Windows:
        .\venv-ui\Scripts\activate
    2. Run the Python script:

      python clean-ui.py

Features

  • Upload an image and enter a prompt to generate an image description.
  • Adjustable parameters such as temperature, top-k, and top-p for more control over the generated text.
  • Chatbot history to display prompt-response interactions.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

About

Simple UI for Llama-3.2-11B-Vision & Molmo-7B-D

License:Apache License 2.0


Languages

Language:Python 97.3%Language:Batchfile 2.7%