Qingsong Liu (pineking)

pineking

Geek Repo

Company:@Unisound @unisound-ail

Location:China

Github PK Tool:Github PK Tool


Organizations
kubeflow

Qingsong Liu's repositories

AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Alibaba DAMO Academy.

Language:C++License:Apache-2.0Stargazers:0Issues:1Issues:0

Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

AniPortrait

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

bark

🔊 Text-Prompted Generative Audio Model

Language:Jupyter NotebookLicense:MITStargazers:0Issues:0Issues:0

Monkey

Monkey (LMM); 多模态大模型 华科小猴子

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

catvision

A multimodal large-scale model, which performs close to the closed-source Qwen-VL-PLUS on many datasets and significantly surpasses the performance of the open-source model Qwen-VL-7B-Chat.

Stargazers:0Issues:0Issues:0

ChatTTS

ChatTTS is a generative speech model for daily dialogue.

License:NOASSERTIONStargazers:0Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0

DALLE-pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

License:MITStargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:1Issues:0

dreamtalk

Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

E2STR

The official code for the CVPR 2024 paper: Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer

License:Apache-2.0Stargazers:0Issues:0Issues:0

Emu

Emu: An Open Multimodal Generalist

Language:PythonStargazers:0Issues:1Issues:0

fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

License:MITStargazers:0Issues:0Issues:0

FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0

FiT

FiT: Flexible Vision Transformer for Diffusion Model

License:Apache-2.0Stargazers:0Issues:0Issues:0

generative-models

Generative Models by Stability AI

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

genmusic_demo_list

a list of demo websites for automatic music generation research

Stargazers:0Issues:1Issues:0

LLaMA2-Accessory

An Open-source Toolkit for LLM Development

Language:PythonLicense:NOASSERTIONStargazers:0Issues:1Issues:0

LLM-groundedDiffusion

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models (LLM-grounded Diffusion: LMD)

Language:PythonStargazers:0Issues:1Issues:0
Language:Jupyter NotebookLicense:NOASSERTIONStargazers:0Issues:1Issues:0

MultimodalOCR

On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)

Language:PythonStargazers:0Issues:1Issues:0

nougat

Implementation of Nougat Neural Optical Understanding for Academic Documents

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

Open-AnimateAnyone

Unofficial Implementation of Animate Anyone

Language:PythonStargazers:0Issues:1Issues:0

PhotoMaker

PhotoMaker

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:0Issues:1Issues:0
Language:HTMLStargazers:0Issues:3Issues:0

Prompt-Engineering-Guide

🐙 Guides, papers, lecture, notebooks and resources for prompt engineering

Language:MDXLicense:MITStargazers:0Issues:1Issues:0

unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

vocos

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

License:MITStargazers:0Issues:0Issues:0