Madison Smith's repositories
MiniGemini
Official implementation for Mini-Gemini
parler-tts
Inference and training library for high-quality TTS models.
mPLUG-DocOwl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
ragas
Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
self-rag
This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi.
LLaMA-Factory
Unify Efficient Fine-Tuning of 100+ LLMs
Automatic-Speech-Recognition-from-Scratch
An minimal Seq2Seq example of Automatic Speech Recognition (ASR) based on Transformer
MuseV
MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising
Linly-Talker
Digital Avatar Conversational System - Linly-Talker. 😄✨ Linly-Talker is an intelligent AI system that combines large language models (LLMs) with visual models to create a novel human-AI interaction method. 🤝🤖 It integrates various technologies like Whisper, Linly, Microsoft Speech Services, and SadTalker talking head generation system. 🌟🔬
OOTDiffusion
Official implementation of OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
Whisper-Finetune
Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deployment, Windows desktop deployment, and Android deployment
GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
facefusion
Next generation face swapper and enhancer
CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
roop
one-click face swap
wvp-GB28181-pro
WEB VIDEO PLATFORM是一个基于GB28181-2016标准实现的网络视频平台,支持NAT穿透,支持海康、大华、宇视等品牌的IPC、NVR、DVR接入。支持国标级联,支持rtsp/rtmp等视频流转发到国标平台,支持rtsp/rtmp等推流转发到国标平台。
ZLMediaKit
WebRTC/RTSP/RTMP/HTTP/HLS/HTTP-FLV/WebSocket-FLV/HTTP-TS/HTTP-fMP4/WebSocket-TS/WebSocket-fMP4/GB28181/SRT server and client framework based on C++11
VITS-Pytorch
本项目是基于Pytorch的语音合成项目,使用的是VITS,VITS是一种语音合成方法,这种时端到端的模型使用起来非常简单,不需要文本对齐等太复杂的流程,直接一键训练和生成,大大降低了学习门槛。
PaddlePaddle-DeepSpeech
基于PaddlePaddle实现的语音识别,中文语音识别。项目完善,识别效果好。支持Windows,Linux下训练和预测,支持Nvidia Jetson开发板预测。
Grounded-Segment-Anything
Grounded-SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
recognize-anything
Code for the Recognize Anything Model (RAM) and Tag2Text Model
Detect-and-read-meters
This is the first released system towards complex meters` detection and recognition, which is implemented by computer vision techniques.
FastSAM
Fast Segment Anything
langchain-ChatGLM
langchain-ChatGLM, local knowledge based ChatGLM with langchain | 基于本地知识库的 ChatGLM 问答
CPM-Bee
百亿参数的中英文双语基座大模型
CcClip
使用vue(vue3) + ffmpeg + wasm 实现纯前端音视频编辑,功能包括:视频剪辑、音频剪辑、音频合成裁剪、音波展示、视频抽帧、gif抽帧、帧播放器、字幕、贴图、时间轴、素材轨道
scrcpy
Display and control your Android device
ChatGPT-Next-Web
One-Click to deploy well-designed ChatGPT web UI on Vercel. 一键拥有你自己的 ChatGPT 网页服务。