ainisa20

Madison Smith's repositories

MiniGemini

Official implementation for Mini-Gemini

Apache-2.0000

parler-tts

Inference and training library for high-quality TTS models.

Apache-2.0000

mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Apache-2.0000

ragas

Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines

Apache-2.0000

self-rag

This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi.

MIT000

LLaMA-Factory

Unify Efficient Fine-Tuning of 100+ LLMs

Apache-2.0000

Automatic-Speech-Recognition-from-Scratch

An minimal Seq2Seq example of Automatic Speech Recognition (ASR) based on Transformer

MIT000

MuseV

MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising

MIT000

Digital Avatar Conversational System - Linly-Talker. 😄✨ Linly-Talker is an intelligent AI system that combines large language models (LLMs) with visual models to create a novel human-AI interaction method. 🤝🤖 It integrates various technologies like Whisper, Linly, Microsoft Speech Services, and SadTalker talking head generation system. 🌟🔬

MIT000

OOTDiffusion

Official implementation of OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

NOASSERTION000

Whisper-Finetune

Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deployment, Windows desktop deployment, and Android deployment

Apache-2.0000

GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

MIT000

facefusion

Next generation face swapper and enhancer

NOASSERTION000

ReplaceAnything

000

CogVLM

a state-of-the-art-level open visual language model | 多模态预训练模型

NOASSERTION000

roop

one-click face swap

GPL-3.0000

wvp-GB28181-pro

WEB VIDEO PLATFORM是一个基于GB28181-2016标准实现的网络视频平台，支持NAT穿透，支持海康、大华、宇视等品牌的IPC、NVR、DVR接入。支持国标级联，支持rtsp/rtmp等视频流转发到国标平台，支持rtsp/rtmp等推流转发到国标平台。

Language:JavaMIT000

ZLMediaKit

WebRTC/RTSP/RTMP/HTTP/HLS/HTTP-FLV/WebSocket-FLV/HTTP-TS/HTTP-fMP4/WebSocket-TS/WebSocket-fMP4/GB28181/SRT server and client framework based on C++11

NOASSERTION000

VITS-Pytorch

本项目是基于Pytorch的语音合成项目，使用的是VITS，VITS是一种语音合成方法，这种时端到端的模型使用起来非常简单，不需要文本对齐等太复杂的流程，直接一键训练和生成，大大降低了学习门槛。

Apache-2.0000

PaddlePaddle-DeepSpeech

基于PaddlePaddle实现的语音识别，中文语音识别。项目完善，识别效果好。支持Windows，Linux下训练和预测，支持Nvidia Jetson开发板预测。

Apache-2.0000

Grounded-Segment-Anything

Grounded-SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Apache-2.0000