Qingsong Liu (pineking)

pineking

Geek Repo

Company:@Unisound @unisound-ail

Location:China

Github PK Tool:Github PK Tool


Organizations
kubeflow

Qingsong Liu's repositories

AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Alibaba DAMO Academy.

Language:C++License:Apache-2.0Stargazers:0Issues:1Issues:0

Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

AniPortrait

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

Monkey

Monkey (LMM); 多模态大模型 华科小猴子

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

catvision

A multimodal large-scale model, which performs close to the closed-source Qwen-VL-PLUS on many datasets and significantly surpasses the performance of the open-source model Qwen-VL-7B-Chat.

Stargazers:0Issues:0Issues:0

ChatTTS

ChatTTS is a generative speech model for daily dialogue.

License:NOASSERTIONStargazers:0Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0

DALLE-pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

License:MITStargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:1Issues:0

descript-audio-codec

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

License:MITStargazers:0Issues:0Issues:0

docker_image_pusher

使用Github Action将国外的Docker镜像转存到阿里云私有仓库,供国内服务器使用,免费易用

License:Apache-2.0Stargazers:0Issues:0Issues:0

dreamtalk

Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

E2STR

The official code for the CVPR 2024 paper: Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer

License:Apache-2.0Stargazers:0Issues:0Issues:0

Emu

Emu: An Open Multimodal Generalist

Language:PythonStargazers:0Issues:1Issues:0

fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

License:MITStargazers:0Issues:0Issues:0

FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0

FiT

FiT: Flexible Vision Transformer for Diffusion Model

License:Apache-2.0Stargazers:0Issues:0Issues:0

FunCodec

FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.

License:MITStargazers:0Issues:0Issues:0

generative-models

Generative Models by Stability AI

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

genmusic_demo_list

a list of demo websites for automatic music generation research

Stargazers:0Issues:1Issues:0

LLaMA2-Accessory

An Open-source Toolkit for LLM Development

Language:PythonLicense:NOASSERTIONStargazers:0Issues:1Issues:0

LLM-groundedDiffusion

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models (LLM-grounded Diffusion: LMD)

Language:PythonStargazers:0Issues:1Issues:0
Language:Jupyter NotebookLicense:NOASSERTIONStargazers:0Issues:1Issues:0

MultimodalOCR

On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)

Language:PythonStargazers:0Issues:1Issues:0

Open-AnimateAnyone

Unofficial Implementation of Animate Anyone

Language:PythonStargazers:0Issues:1Issues:0

PhotoMaker

PhotoMaker

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:0Issues:1Issues:0
Language:HTMLStargazers:0Issues:3Issues:0
Stargazers:0Issues:0Issues:0

vocos

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

License:MITStargazers:0Issues:0Issues:0