MarkWuNLP

Yu Wu (吴俣)'s starred repositories

whisper

Robust Speech Recognition via Large-Scale Weak Supervision

Language:PythonMIT67850 5700

stable-diffusion

A latent text-to-image diffusion model

Language:Jupyter NotebookNOASSERTION67608 559 710

lama-cleaner

Image inpainting tool powered by SOTA AI Model. Remove any unwanted object, defect, people from your pictures or erase and replace(powered by stable diffusion) any thing on your pictures.

Language:PythonApache-2.015066 120 336

GLM-130B

GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)

Language:PythonApache-2.07652 99 198

modelscope

ModelScope: bring the notion of Model-as-a-Service to life.

Language:PythonApache-2.06827 71 577

Fengshenbang-LM

Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系，成为中文AIGC和认知智能的基础设施。

Language:PythonApache-2.04001 57 294

YaLM-100B

Pretrained language model with 100B parameters

Language:PythonApache-2.03734 48 28

encodec

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

Language:PythonMIT3428 57 70

BIG-bench

Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models

Language:PythonApache-2.02820 51 150

NUWA

A unified 3D Transformer Pipeline for visual synthesis

2805 136 20

evaluate

🤗 Evaluate: A library for easily evaluating machine learning models and datasets.

Language:PythonApache-2.01971 46 291

audio-diffusion-pytorch

Audio generation using diffusion models, in PyTorch.

Language:PythonMIT1922 39 43

Chain-of-ThoughtsPapers

A trend starts from "Chain of Thought Prompting Elicits Reasoning in Large Language Models".

1914 48 3

GiantMIDI-Piano

Language:Python1692 24 11

SpeechT5

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing

Language:PythonMIT1161 24 85

coyo-dataset

COYO-700M: Large-scale Image-Text Pair Dataset

Language:Python1142 14 14

Diffusion-LM

Language:PythonApache-2.01034 17 71

wit

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

NOASSERTION994 38 6

roformer

Rotary Transformer

Language:PythonApache-2.0783 8 8

audio-dataset

Audio Dataset for training CLAP and other models

Language:Python615 21 57

prize

A prize for finding tasks that cause large language models to show inverse scaling

CC-BY-4.0591 27 7

Speech-Backbones

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.

Language:Jupyter Notebook555 23 29

BeatNet

BeatNet is state-of-the-art (Real-Time) and Offline joint music beat, downbeat, tempo, and meter tracking system using CRNN and particle filtering. (ISMIR 2021's paper implementation).

Language:PythonCC-BY-4.0316 9 27