Kind-Unes / MultiModal-Model

This project is a multi-modal model that works with multiple models combined and accepts audio, images, and text as inputs, generating corresponding audio, images, and text outputs.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Project Name

Multi-Modal Model Python Project

Overview

This project is a multi-modal model that accepts audio, images, and text as inputs, generating corresponding audio, images, and text outputs.

Features

  • Streamlit Interface : Coming Soon
  • Input Modalities: Audio, Images, Text, videos , emojis, multi inputs
  • Output Modalities: Audio, Images, Text, Videos , emojis , segmented images, images objects detection coordinates, multi outputs

Getting Started

Prerequisites

  • Python 3.x
  • Dependencies listed in requirements.txt

Installation

git clone https://github.com/Kind-Unes/Multi-Model-V1.git
cd 'MultiMODEL Template'
pip install -r requirements.txt

Usage

python model.py

Credits

TXT2IMG Models

Text Generation Model

IMG2TXT Model

TTS Model

STT Model

Others . . . . .

Websites

About

This project is a multi-modal model that works with multiple models combined and accepts audio, images, and text as inputs, generating corresponding audio, images, and text outputs.


Languages

Language:Python 99.7%Language:PowerShell 0.3%Language:Batchfile 0.0%