Madison Smith's repositories
Automatic-Speech-Recognition-from-Scratch
An minimal Seq2Seq example of Automatic Speech Recognition (ASR) based on Transformer
CcClip
使用vue(vue3) + ffmpeg + wasm 实现纯前端音视频编辑,功能包括:视频剪辑、音频剪辑、音频合成裁剪、音波展示、视频抽帧、gif抽帧、帧播放器、字幕、贴图、时间轴、素材轨道
ChatGPT-Next-Web
One-Click to deploy well-designed ChatGPT web UI on Vercel. 一键拥有你自己的 ChatGPT 网页服务。
CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
CPM-Bee
百亿参数的中英文双语基座大模型
Detect-and-read-meters
This is the first released system towards complex meters` detection and recognition, which is implemented by computer vision techniques.
facefusion
Next generation face swapper and enhancer
FastSAM
Fast Segment Anything
GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Grounded-Segment-Anything
Grounded-SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
langchain-ChatGLM
langchain-ChatGLM, local knowledge based ChatGLM with langchain | 基于本地知识库的 ChatGLM 问答
Linly-Talker
Digital Avatar Conversational System - Linly-Talker. 😄✨ Linly-Talker is an intelligent AI system that combines large language models (LLMs) with visual models to create a novel human-AI interaction method. 🤝🤖 It integrates various technologies like Whisper, Linly, Microsoft Speech Services, and SadTalker talking head generation system. 🌟🔬
LLaMA-Factory
Unify Efficient Fine-Tuning of 100+ LLMs
MiniGemini
Official implementation for Mini-Gemini
mPLUG-DocOwl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
MuseV
MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising
OOTDiffusion
Official implementation of OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
PaddlePaddle-DeepSpeech
基于PaddlePaddle实现的语音识别,中文语音识别。项目完善,识别效果好。支持Windows,Linux下训练和预测,支持Nvidia Jetson开发板预测。
parler-tts
Inference and training library for high-quality TTS models.
ragas
Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
recognize-anything
Code for the Recognize Anything Model (RAM) and Tag2Text Model
roop
one-click face swap
scrcpy
Display and control your Android device
self-rag
This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi.
VITS-Pytorch
本项目是基于Pytorch的语音合成项目,使用的是VITS,VITS是一种语音合成方法,这种时端到端的模型使用起来非常简单,不需要文本对齐等太复杂的流程,直接一键训练和生成,大大降低了学习门槛。
Whisper-Finetune
Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deployment, Windows desktop deployment, and Android deployment
wvp-GB28181-pro
WEB VIDEO PLATFORM是一个基于GB28181-2016标准实现的网络视频平台,支持NAT穿透,支持海康、大华、宇视等品牌的IPC、NVR、DVR接入。支持国标级联,支持rtsp/rtmp等视频流转发到国标平台,支持rtsp/rtmp等推流转发到国标平台。
ZLMediaKit
WebRTC/RTSP/RTMP/HTTP/HLS/HTTP-FLV/WebSocket-FLV/HTTP-TS/HTTP-fMP4/WebSocket-TS/WebSocket-fMP4/GB28181/SRT server and client framework based on C++11