1003761102's starred repositories
DiffusionInst
This repo is the code of paper "DiffusionInst: Diffusion Model for Instance Segmentation" (ICASSP'24).
DiffusionDet
[ICCV2023 Oral] PyTorch implementation of DiffusionDet (https://arxiv.org/abs/2211.09788)
stable-diffusion
A latent text-to-image diffusion model
Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
bert-crf-token_classification_ner
bert、roberta ner命名实体识别
speech-resynthesis
An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.
TorchKMeans
A torch-based implementation of K-Means and K-Means++
chinese_speech_pretrain
chinese speech pretrained models
audiolm-pytorch
Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
VoiceStreamAI
Near-Realtime audio transcription using self-hosted Whisper and WebSocket in Python/JS
SpeechTokenizer
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
video-retalking
[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
SadTalker-Video-Lip-Sync
本项目基于SadTalkers实现视频唇形合成的Wav2lip。通过以视频文件方式进行语音驱动生成唇形,设置面部区域可配置的增强方式进行合成唇形(人脸)区域画面增强,提高生成唇形的清晰度。使用DAIN 插帧的DL算法对生成视频进行补帧,补充帧间合成唇形的动作过渡,使合成的唇形更为流畅、真实以及自然。
Large-Audio-Models
Keep track of big models in audio domain, including speech, singing, music etc.
Qwen-Audio
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
InstructGLM
ChatGLM-6B 指令学习|指令数据|Instruct
ChatGLM-6B
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
Luotuo-Chinese-LLM
骆驼(Luotuo): Open Sourced Chinese Language Models. Developed by 陈启源 @ 华中师范大学 & 李鲁鲁 @ 商汤科技 & 冷子昂 @ 商汤科技
ChatGLM-Tuning
基于ChatGLM-6B + LoRA的Fintune方案
Alpaca-CoT
We unified the interfaces of instruction-tuning data (e.g., CoT data), multiple LLMs and parameter-efficient methods (e.g., lora, p-tuning) together for easy use. We welcome open-source enthusiasts to initiate any meaningful PR on this repo and integrate as many LLM related technologies as possible. 我们打造了方便研究人员上手和使用大模型等微调平台,我们欢迎开源爱好者发起任何有意义的pr!
webrtc_rnnvad
webrtc_rnnvad