pineking

followers

following

stars

@Unisound @unisound-ail

China

Organizations

kubeflow

Qingsong Liu's repositories

AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Alibaba DAMO Academy.

Language:C++Apache-2.0010

Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Language:PythonMIT010

AniPortrait

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

Language:PythonApache-2.0000

audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

Language:PythonMIT010

bark

🔊 Text-Prompted Generative Audio Model

Language:Jupyter NotebookMIT000

Monkey

Monkey (LMM); 多模态大模型华科小猴子

Language:PythonMIT010

catvision

A multimodal large-scale model, which performs close to the closed-source Qwen-VL-PLUS on many datasets and significantly surpasses the performance of the open-source model Qwen-VL-7B-Chat.

000

ChatTTS

ChatTTS is a generative speech model for daily dialogue.

NOASSERTION000

CMMMU

Language:PythonApache-2.0010

DALLE-pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

MIT000

DCTC

Language:Python010

dreamtalk

Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

Language:PythonMIT010

E2STR

The official code for the CVPR 2024 paper: Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer

Apache-2.0000

Emu

Emu: An Open Multimodal Generalist

Language:Python010

fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

MIT000

FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Language:PythonApache-2.0010

FiT

FiT: Flexible Vision Transformer for Diffusion Model

Apache-2.0000

generative-models

Generative Models by Stability AI

Language:PythonMIT010

genmusic_demo_list

a list of demo websites for automatic music generation research

010

LLaMA2-Accessory

An Open-source Toolkit for LLM Development

Language:PythonNOASSERTION010

LLM-groundedDiffusion

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models (LLM-grounded Diffusion: LMD)

Language:Python010

MakeItTalk

Language:Jupyter NotebookNOASSERTION010

MultimodalOCR

On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)

Language:Python010

nougat

Implementation of Nougat Neural Optical Understanding for Academic Documents

Language:PythonMIT010

Open-AnimateAnyone

Unofficial Implementation of Animate Anyone

Language:Python010

PhotoMaker

PhotoMaker

Language:Jupyter NotebookNOASSERTION010

pineking.github.io

Language:HTML030

Prompt-Engineering-Guide

🐙 Guides, papers, lecture, notebooks and resources for prompt engineering

Language:MDXMIT010

unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Language:PythonMIT010

vocos

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

MIT000