lujiale621's starred repositories

torchchat

Run PyTorch LLMs locally on servers, desktop and mobile

Language:PythonLicense:BSD-3-ClauseStargazers:2510Issues:0Issues:0

NapCatQQ

基于NTQQ的无头Bot框架

Language:TypeScriptLicense:MPL-2.0Stargazers:1431Issues:0Issues:0

pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Language:Jupyter NotebookLicense:MITStargazers:5650Issues:0Issues:0

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonLicense:Apache-2.0Stargazers:130205Issues:0Issues:0

Stable-Hair

Stable-Hair: Real-World Hair Transfer via Diffusion Model

License:Apache-2.0Stargazers:279Issues:0Issues:0

MovieChat

[CVPR 2024] 🎬💭 chat with over 10K frames of video!

Language:PythonLicense:BSD-3-ClauseStargazers:472Issues:0Issues:0

CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Language:PythonLicense:Apache-2.0Stargazers:3428Issues:0Issues:0

How-to-use-Transformers

Transformers 库快速入门教程

Language:PythonLicense:Apache-2.0Stargazers:877Issues:0Issues:0

IDM-VTON

[ECCV2024] IDM-VTON : Improving Diffusion Models for Authentic Virtual Try-on in the Wild

Language:PythonStargazers:3319Issues:0Issues:0

Qwen2-Audio

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Stargazers:520Issues:0Issues:0

IMAGDressing

👔IMAGDressing👔: Interactive Modular Apparel Generation for Virtual Dressing

Language:PythonLicense:Apache-2.0Stargazers:839Issues:0Issues:0

Stirling-PDF

#1 Locally hosted web application that allows you to perform various operations on PDF files

Language:JavaLicense:GPL-3.0Stargazers:35681Issues:0Issues:0

mr-Blip

Official Implementation of "The Surprising Effectiveness of Multimodal Large Language Models for Video Moment Retrieval"

Language:PythonLicense:BSD-3-ClauseStargazers:26Issues:0Issues:0

fish-speech

Brand new TTS solution

Language:PythonLicense:NOASSERTIONStargazers:6923Issues:0Issues:0

SoniTranslate

Synchronized Translation for Videos. Video dubbing

Language:PythonLicense:Apache-2.0Stargazers:399Issues:0Issues:0

StreamSpeech

StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.

Language:PythonLicense:MITStargazers:787Issues:0Issues:0

video-mamba-suite

The suite of modeling video with Mamba

Language:PythonLicense:MITStargazers:202Issues:0Issues:0

R2-Tuning

🌀 R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding (ECCV 2024)

Language:PythonLicense:BSD-3-ClauseStargazers:42Issues:0Issues:0

sherpa-onnx

Speech-to-text, text-to-speech, and speaker recognition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter

Language:C++License:Apache-2.0Stargazers:2675Issues:0Issues:0

wesubtitle

用 OCR 提取视频硬字幕

Language:PythonLicense:Apache-2.0Stargazers:52Issues:0Issues:0

ShareGPT4Video

An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Language:PythonStargazers:1191Issues:0Issues:0

video-subtitle-extractor

视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.

Language:PythonLicense:Apache-2.0Stargazers:5447Issues:0Issues:0

BilibiliSummary

A chrome extension helps you summary video on bilibili.

Language:TypeScriptLicense:BSD-3-ClauseStargazers:712Issues:0Issues:0

Uni-TTS

本项目意图在于让使用各类语音合成引擎的方式变得统一,支持多种语音合成引擎适配器,允许直接作为模组使用或启动后端服务

Language:PythonLicense:MITStargazers:582Issues:0Issues:0

GPT-SoVITS-Inference

Inference Specialization

Language:PythonLicense:MITStargazers:236Issues:0Issues:0

MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Language:PythonLicense:Apache-2.0Stargazers:8198Issues:0Issues:0

GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Language:PythonLicense:MITStargazers:30310Issues:0Issues:0

RTranslator

Open source real-time translation app for Android that runs locally

Language:C++License:Apache-2.0Stargazers:5951Issues:0Issues:0

whisper

Robust Speech Recognition via Large-Scale Weak Supervision

Language:PythonLicense:MITStargazers:65620Issues:0Issues:0

fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Language:PythonLicense:MITStargazers:29958Issues:0Issues:0