1003761102's starred repositories

DiffusionInst

This repo is the code of paper "DiffusionInst: Diffusion Model for Instance Segmentation" (ICASSP'24).

Language:PythonLicense:Apache-2.0Stargazers:221Issues:0Issues:0

DiffusionDet

[ICCV2023 Oral] PyTorch implementation of DiffusionDet (https://arxiv.org/abs/2211.09788)

Language:PythonLicense:NOASSERTIONStargazers:2008Issues:0Issues:0

stable-diffusion

A latent text-to-image diffusion model

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:65978Issues:0Issues:0

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.

Stargazers:9738Issues:0Issues:0

bert-crf-token_classification_ner

bert、roberta ner命名实体识别

Language:PythonStargazers:98Issues:0Issues:0

grpc

The C based gRPC (C++, Python, Ruby, Objective-C, PHP, C#)

Language:C++License:Apache-2.0Stargazers:40958Issues:0Issues:0

speech-resynthesis

An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

Language:PythonLicense:NOASSERTIONStargazers:358Issues:0Issues:0

TorchKMeans

A torch-based implementation of K-Means and K-Means++

Language:PythonStargazers:16Issues:0Issues:0

chinese_speech_pretrain

chinese speech pretrained models

Language:ShellStargazers:914Issues:0Issues:0

audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch

Language:PythonLicense:MITStargazers:2292Issues:0Issues:0

VoiceStreamAI

Near-Realtime audio transcription using self-hosted Whisper and WebSocket in Python/JS

Language:PythonLicense:MITStargazers:494Issues:0Issues:0

SpeechTokenizer

This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Language:PythonLicense:Apache-2.0Stargazers:312Issues:0Issues:0

video-retalking

[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild

Language:PythonLicense:Apache-2.0Stargazers:5856Issues:0Issues:0

SadTalker-Video-Lip-Sync

本项目基于SadTalkers实现视频唇形合成的Wav2lip。通过以视频文件方式进行语音驱动生成唇形,设置面部区域可配置的增强方式进行合成唇形(人脸)区域画面增强,提高生成唇形的清晰度。使用DAIN 插帧的DL算法对生成视频进行补帧,补充帧间合成唇形的动作过渡,使合成的唇形更为流畅、真实以及自然。

Language:PythonStargazers:1592Issues:0Issues:0

dreamtalk

Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

Language:PythonLicense:MITStargazers:1405Issues:0Issues:0

Wav2Lip

This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs

Language:PythonStargazers:9515Issues:0Issues:0

ltu

Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".

Language:PythonStargazers:306Issues:0Issues:0

Large-Audio-Models

Keep track of big models in audio domain, including speech, singing, music etc.

Stargazers:401Issues:0Issues:0

Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Language:PythonLicense:NOASSERTIONStargazers:1136Issues:0Issues:0

Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Language:PythonLicense:Apache-2.0Stargazers:11723Issues:0Issues:0

ta

Technical Analysis Library using Pandas and Numpy

Language:Jupyter NotebookLicense:MITStargazers:4080Issues:0Issues:0

MOSS

An open-source tool-augmented conversational language model from Fudan University

Language:PythonLicense:Apache-2.0Stargazers:11857Issues:0Issues:0

InstructGLM

ChatGLM-6B 指令学习|指令数据|Instruct

Language:PythonLicense:MITStargazers:652Issues:0Issues:0

ChatGLM-6B

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

Language:PythonLicense:Apache-2.0Stargazers:39648Issues:0Issues:0

Luotuo-Chinese-LLM

骆驼(Luotuo): Open Sourced Chinese Language Models. Developed by 陈启源 @ 华中师范大学 & 李鲁鲁 @ 商汤科技 & 冷子昂 @ 商汤科技

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:3617Issues:0Issues:0

ChatGLM-Tuning

基于ChatGLM-6B + LoRA的Fintune方案

Language:PythonLicense:MITStargazers:3686Issues:0Issues:0

Alpaca-CoT

We unified the interfaces of instruction-tuning data (e.g., CoT data), multiple LLMs and parameter-efficient methods (e.g., lora, p-tuning) together for easy use. We welcome open-source enthusiasts to initiate any meaningful PR on this repo and integrate as many LLM related technologies as possible. 我们打造了方便研究人员上手和使用大模型等微调平台,我们欢迎开源爱好者发起任何有意义的pr!

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:2501Issues:0Issues:0

BELLE

BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)

Language:HTMLLicense:Apache-2.0Stargazers:7633Issues:0Issues:0

webrtc_rnnvad

webrtc_rnnvad

Language:C++Stargazers:17Issues:0Issues:0

rnnoise

Recurrent neural network for audio noise reduction

Language:CLicense:BSD-3-ClauseStargazers:3772Issues:0Issues:0