Rongjiehuang

followers

following

stars

Facebook AI Research (FAIR)

rongjiehuang.github.io

Organizations

AIGC-Audio

Rongjiehuang's starred repositories

whisper

Robust Speech Recognition via Large-Scale Weak Supervision

Language:PythonMIT63806 5310

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Language:PythonApache-2.033605 339 2627

ChatGLM2-6B

ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型

Language:PythonNOASSERTION15603 135 615

codellama

Inference code for CodeLlama models

Language:PythonNOASSERTION13860 159 169

seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

Language:Jupyter NotebookNOASSERTION10500 139 328

xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

Language:PythonNOASSERTION7959 77 492

llama-recipes

Scripts for fine-tuning Llama2 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization & question answering. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment.Demo apps to showcase Llama2 for WhatsApp & Messenger

Language:Jupyter NotebookNOASSERTION7850 68 227

streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Language:PythonMIT6341 61 77

lit-gpt

Hackable implementation of state-of-the-art open-source LLMs based on nanoGPT. Supports flash attention, 4-bit and 8-bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.

Language:PythonApache-2.05189 63 476

aim

Aim 💫 — An easy-to-use & supercharged open-source experiment tracker.

Language:PythonApache-2.04928 43 993

latent-consistency-model

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

Language:PythonMIT4185 61 91

motion-diffusion-model

The official PyTorch implementation of the paper "Human Motion Diffusion Model"

Language:PythonMIT2934 68 195

LLaMA2-Accessory

An Open-source Toolkit for LLM Development

Language:PythonNOASSERTION2602 35 132

Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Language:PythonBSD-3-Clause2568 31 149

stable-audio-tools

Generative models for conditional audio generation

Language:PythonMIT2218 40 60

consistencydecoder

Consistency Distilled Diff VAE

Language:PythonMIT2097 23 19

gigagan-pytorch

Implementation of GigaGAN, new SOTA GAN out of Adobe. Culmination of nearly a decade of research into GANs

Language:PythonMIT1685 73 47

Macaw-LLM

Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

Language:PythonApache-2.01460 33 34

LLM-eval-survey

The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".

CLAP

Contrastive Language-Audio Pretraining

Language:PythonCC0-1.01231 29 83

Sophia

The official implementation of “Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training”

Language:PythonMIT914 16 40

LLaSM

第一个支持中英文双语语音-文本多模态对话的开源可商用对话模型。便捷的语音输入将大幅改善以文本为输入的大模型的使用体验，同时避免了基于 ASR 解决方案的繁琐流程以及可能引入的错误。

Language:PythonApache-2.0493 13 6

UniAudio

The Open Source Code of UniAudio

Language:Python471 38 27

Persona-Dialogue-Generation

The code of ACL 2020 paper "You Impress Me: Dialogue Generation via Mutual Persona Perception"

Language:PythonMIT308 8 34

lp-music-caps

LP-MusicCaps: LLM-Based Pseudo Music Captioning [ISMIR23]

Language:Python245 8 7

Whispering-LLaMA

EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correction

Language:Jupyter NotebookMIT204 4 10

bigvsan

Pytorch implementation of BigVSAN

Language:PythonMIT188 28 5

cbtm

Code repository for the c-BTM paper

Language:PythonApache-2.0105 5 3

MULTI-AUDIODEC

This is the official implementation of our multi-channel multi-speaker multi-spatial neural audio codec architecture.

Language:Python40 20

Look2hear

A toolkit for researchers in the multimodal sound separation.