Droliven

Levon Dang's starred repositories

whisper

Robust Speech Recognition via Large-Scale Weak Supervision

Language:PythonMIT66570 5570

generative-models

Generative Models by Stability AI

Language:PythonMIT23890 253 294

Swin-Transformer

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".

Language:PythonMIT13505 127 309

chatgpt-mirai-qq-bot

🚀 一键部署！真正的 AI 聊天机器人！支持ChatGPT、文心一言、讯飞星火、Bing、Bard、ChatGLM、POE，多账号，人设调教，虚拟女仆、图片渲染、语音发送 | 支持 QQ、Telegram、Discord、微信等平台

Language:PythonAGPL-3.012873 72 1045

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

Language:PythonApache-2.010820 184 1900

ASRT_SpeechRecognition

A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统

Language:PythonGPL-3.07711 186 289

fast-stable-diffusion

fast-stable-diffusion + DreamBooth

Language:PythonMIT7459 85 2036

sd-scripts

Language:PythonApache-2.04766 53 908

latent-consistency-model

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

Language:PythonMIT4260 63 93

mmpretrain

OpenMMLab Pre-training Toolbox and Benchmark

Language:PythonApache-2.03350 30 770

Mubert-Text-to-Music

A simple notebook demonstrating prompt-based music generation via Mubert API

Language:Jupyter Notebook2731 46 16

deep-motion-editing

An end-to-end library for editing and rendering motion of 3D characters with deep learning [SIGGRAPH 2020]

Language:PythonBSD-2-Clause1541 65 200

stable-diffusion-webui-wd14-tagger

Labeling extension for Automatic1111's Web UI

Language:Python1308 9 91

improved-aesthetic-predictor

CLIP+MLP Aesthetic Score Predictor

Language:PythonApache-2.0845 6 10

Make-An-Audio

PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model

Language:PythonMIT731 71 14

stable-diffusion-aesthetic-gradients

Personalization for Stable Diffusion via Aesthetic Gradients 🎨

Language:Jupyter NotebookNOASSERTION716 18 18

musicnn

Pronounced as "musician", musicnn is a set of pre-trained deep convolutional neural networks for music audio tagging.

Language:Jupyter NotebookISC586 20 21

aesthetic-predictor

A linear estimator on top of clip to predict the aesthetic quality of pictures

Language:Jupyter NotebookMIT436 13 7

PickScore

Language:PythonMIT405 3 29

sota-music-tagging-models

Language:PythonMIT387 8 19

ubisoft-laforge-ZeroEGGS

All about ZeroEggs

Language:PythonNOASSERTION360 12 41

HPSv2

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

Language:Jupyter NotebookApache-2.0350 10 38

llark

Code for the paper "LLark: A Multimodal Instruction-Following Language Model for Music" by Josh Gardner, Simon Durand, Daniel Stoller, and Rachel Bittner.

Language:Jupyter NotebookNOASSERTION287 7 7

Gesture-Generation-from-Trimodal-Context

Speech Gesture Generation from the Trimodal Context of Text, Audio, and Speaker Identity (SIGGRAPH Asia 2020)

Language:PythonNOASSERTION242 10 58

DiffuseStyleGesture

DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models (IJCAI 2023) | The DiffuseStyleGesture+ entry to the GENEA Challenge 2023 (ICMI 2023, Reproducibility Award)

Language:PythonMIT146 7 39

Audio2Gestures

Audio2Motion Official implementation for Audio2Motion: Generating Diverse Gestures from Speech with Conditional Variational Autoencoders.

Language:Python121 2 18

youtube-gesture-dataset

This repository contains scripts to build Youtube Gesture Dataset.

Language:PythonBSD-3-Clause115 4 9

AudioEmotion

Recognize Audio Emotion.

Language:PythonMIT86 3 2

ImageAestheticAssessmentPyTorch

Image Aesthetic Assessment in PyTorch with implemented popular datasets and models (possibly providing the pretrained ones).

Language:PythonApache-2.036 3 1

beat

Language:Python902