Simon Lee (whatissimondoing)

whatissimondoing

Geek Repo

Company:Fudan University

Location:Shanghai, China

Github PK Tool:Github PK Tool

Simon Lee's starred repositories

Language:PythonLicense:NOASSERTIONStargazers:34518Issues:300Issues:352

ChatTTS

A generative speech model for daily dialogue.

Language:PythonLicense:AGPL-3.0Stargazers:31165Issues:180Issues:518

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonLicense:Apache-2.0Stargazers:19570Issues:159Issues:1496

ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

Language:PythonLicense:Apache-2.0Stargazers:18388Issues:110Issues:1223

fish-speech

Brand new TTS solution

Language:PythonLicense:NOASSERTIONStargazers:12757Issues:90Issues:363

gorilla

Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)

Language:PythonLicense:Apache-2.0Stargazers:11269Issues:98Issues:225

livekit

End-to-end stack for WebRTC. SFU media server and SDKs.

Language:GoLicense:Apache-2.0Stargazers:9783Issues:123Issues:525

BBDown

Bilibili Downloader. 一款命令行式哔哩哔哩下载器.

sglang

SGLang is a fast serving framework for large language models and vision language models.

Language:PythonLicense:Apache-2.0Stargazers:5377Issues:55Issues:540

AniPortrait

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

Language:PythonLicense:Apache-2.0Stargazers:4541Issues:61Issues:183

Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Language:PythonLicense:MITStargazers:4492Issues:58Issues:152

xtuner

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

Language:PythonLicense:Apache-2.0Stargazers:3799Issues:33Issues:515

V-Express

V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.

OS-Copilot

An self-improving embodied conversational agent seamlessly integrated into the operating system to automate our daily tasks.

Language:PythonLicense:MITStargazers:1471Issues:21Issues:29

distilabel

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

Language:PythonLicense:Apache-2.0Stargazers:1452Issues:13Issues:413

Bunny

A family of lightweight multimodal models.

Language:PythonLicense:Apache-2.0Stargazers:892Issues:19Issues:117

bm25s

Fast lexical search library implementing BM25 in Python using Numpy and Scipy

Language:PythonLicense:MITStargazers:792Issues:4Issues:24

EmoLLM

心理健康大模型、LLM、The Big Model of Mental Health、Finetune、InternLM2、InternLM2.5、Qwen、ChatGLM、Baichuan、DeepSeek、Mixtral、LLama3、GLM4、Qwen2、LLama3.1

Language:PythonLicense:MITStargazers:775Issues:4Issues:42

UMOE-Scaling-Unified-Multimodal-LLMs

The codes about "Uni-MoE: Scaling Unified Multimodal Models with Mixture of Experts"

MInference

To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.

Language:PythonLicense:MITStargazers:716Issues:6Issues:53

emotion2vec

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

LLaSM

第一个支持中英文双语语音-文本多模态对话的开源可商用对话模型。便捷的语音输入将大幅改善以文本为输入的大模型的使用体验,同时避免了基于 ASR 解决方案的繁琐流程以及可能引入的错误。

Language:PythonLicense:Apache-2.0Stargazers:525Issues:14Issues:8

SLAM-LLM

Speech, Language, Audio, Music Processing with Large Language Model

Language:PythonLicense:MITStargazers:512Issues:18Issues:36

Vach

Real time streaming talking head

EmoBox

[INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark

xRAG

Source code for xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token

Language:Jupyter NotebookStargazers:83Issues:2Issues:12

Flash-VStream

Please refer to our official repo at https://github.com/IVGSZ/Flash-VStream.

Language:PythonLicense:Apache-2.0Stargazers:48Issues:2Issues:6

EMO-SUPERB-submission

EMO-SUPERB submission

Language:PythonStargazers:27Issues:4Issues:0