Qingsong Liu (pineking)

pineking

Geek Repo

Company:@Unisound @unisound-ail

Location:China

Github PK Tool:Github PK Tool


Organizations
kubeflow

Qingsong Liu's repositories

AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Alibaba DAMO Academy.

Language:C++License:Apache-2.0Stargazers:0Issues:1Issues:0

Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

AniPortrait

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0

Monkey

Monkey (LMM); 多模态大模型 华科小猴子

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

catvision

A multimodal large-scale model, which performs close to the closed-source Qwen-VL-PLUS on many datasets and significantly surpasses the performance of the open-source model Qwen-VL-7B-Chat.

Language:PythonStargazers:0Issues:1Issues:0

ChatTTS

ChatTTS is a generative speech model for daily dialogue.

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:0Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0

CosyVoice

LLM based TTS model, providing inference/training/deployment full-stack ability.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

DALLE-pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

descript-audio-codec

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

License:MITStargazers:0Issues:0Issues:0

docker_image_pusher

使用Github Action将国外的Docker镜像转存到阿里云私有仓库,供国内服务器使用,免费易用

License:Apache-2.0Stargazers:0Issues:0Issues:0

dreamtalk

Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

E2STR

The official code for the CVPR 2024 paper: Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0

fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

FiT

FiT: Flexible Vision Transformer for Diffusion Model

License:Apache-2.0Stargazers:0Issues:1Issues:0

FunCodec

FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

Glyph-ByT5

[ECCV2024] This is an official inference code of the paper "Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering" and "Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering""

License:Apache-2.0Stargazers:0Issues:0Issues:0

LLM-groundedDiffusion

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models (LLM-grounded Diffusion: LMD)

Language:PythonStargazers:0Issues:1Issues:0

Medical-SAM2

Medical SAM 2: Segment Medical Images As Video Via Segment Anything Model 2

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

mmaction2

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

MultimodalOCR

On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)

Language:PythonStargazers:0Issues:1Issues:0

Open-AnimateAnyone

Unofficial Implementation of Animate Anyone

Language:PythonStargazers:0Issues:1Issues:0

PhotoMaker

PhotoMaker

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:0Issues:1Issues:0

Qwen2-VL-Finetune

An open-source implementaion for fine-tuning Qwen2-VL-2B and Qwen2-VL-7B.

License:Apache-2.0Stargazers:0Issues:0Issues:0

RecordRTC

RecordRTC is WebRTC JavaScript library for audio/video as well as screen activity recording. It supports Chrome, Firefox, Opera, Android, and Microsoft Edge. Platforms: Linux, Mac and Windows.

Language:JavaScriptLicense:MITStargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:0Issues:0

SenseVoice

Multilingual Voice Understanding Model

License:MITStargazers:0Issues:0Issues:0

vocos

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

whisper_streaming

Whisper realtime streaming for long speech-to-text transcription and translation

License:MITStargazers:0Issues:0Issues:0

xtts-api-server

A simple FastAPI Server to run XTTSv2

License:MITStargazers:0Issues:0Issues:0