Qingsong Liu (pineking)

pineking

Geek Repo

Company:@Unisound @unisound-ail

Location:China

Github PK Tool:Github PK Tool


Organizations
kubeflow

Qingsong Liu's starred repositories

chameleon

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Language:PythonLicense:NOASSERTIONStargazers:1585Issues:0Issues:0

mmaction2

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

Language:PythonLicense:Apache-2.0Stargazers:4065Issues:0Issues:0

RecordRTC

RecordRTC is WebRTC JavaScript library for audio/video as well as screen activity recording. It supports Chrome, Firefox, Opera, Android, and Microsoft Edge. Platforms: Linux, Mac and Windows.

Language:JavaScriptLicense:MITStargazers:6482Issues:0Issues:0

Recorder

html5 js 录音 mp3 wav ogg webm amr g711a g711u 格式,支持pc和Android、iOS部分浏览器、Hybrid App(提供Android iOS App源码)、微信,提供ASR语音识别转文字 H5版语音通话聊天示例 DTMF编码解码

Language:JavaScriptLicense:MITStargazers:4634Issues:0Issues:0

pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Language:Jupyter NotebookLicense:MITStargazers:5586Issues:0Issues:0

pipecat

Open Source framework for voice and multimodal conversational AI

Language:PythonLicense:BSD-2-ClauseStargazers:2489Issues:0Issues:0

VoiceStreamAI

Near-Realtime audio transcription using self-hosted Whisper and WebSocket in Python/JS

Language:PythonLicense:MITStargazers:581Issues:0Issues:0

selfservicekiosk-audio-streaming

A best practice for streaming audio from a browser microphone to Dialogflow or Google Cloud STT by using websockets.

Language:JavaScriptLicense:Apache-2.0Stargazers:138Issues:0Issues:0

Awesome-Speaker-Diarization

Some comprehensive papers about speaker diarization

Stargazers:165Issues:0Issues:0
Language:Jupyter NotebookLicense:Apache-2.0Stargazers:7035Issues:0Issues:0

Languagecodec

Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models

Language:PythonLicense:MITStargazers:183Issues:0Issues:0

SpeechTokenizer

This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Language:PythonLicense:Apache-2.0Stargazers:378Issues:0Issues:0

AcademiCodec

AcademiCodec: An Open Source Audio Codec Model for Academic Research

Language:PythonStargazers:537Issues:0Issues:0

vector-quantize-pytorch

Vector (and Scalar) Quantization, in Pytorch

Language:PythonLicense:MITStargazers:2208Issues:0Issues:0

audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch

Language:PythonLicense:MITStargazers:2331Issues:0Issues:0

descript-audio-codec

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Language:PythonLicense:MITStargazers:1044Issues:0Issues:0

DALLE-pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Language:PythonLicense:MITStargazers:5537Issues:0Issues:0

DALL-E

PyTorch package for the discrete VAE used for DALL·E.

Language:PythonLicense:NOASSERTIONStargazers:10766Issues:0Issues:0

ChatTTS

A generative speech model for daily dialogue.

Language:PythonLicense:AGPL-3.0Stargazers:28161Issues:0Issues:0

speech-trident

Awesome speech/audio LLMs, representation learning, and codec models

Stargazers:512Issues:0Issues:0

USLM

Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)

Language:PythonStargazers:124Issues:0Issues:0

FunCodec

FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.

Language:PythonLicense:MITStargazers:319Issues:0Issues:0

SpeechGPT

SpeechGPT Series: Speech Large Language Models

Language:PythonLicense:Apache-2.0Stargazers:1063Issues:0Issues:0

MuLan

MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)

Language:PythonStargazers:111Issues:0Issues:0

MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

Language:PythonLicense:Apache-2.0Stargazers:8054Issues:0Issues:0

OneChart

[ACM'MM 2024 Oral] Official code for "OneChart: Purify the Chart Structural Extraction via One Auxiliary Token"

Language:PythonLicense:Apache-2.0Stargazers:123Issues:0Issues:0

AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

Language:C++License:Apache-2.0Stargazers:1212Issues:0Issues:0

AniPortrait

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

Language:PythonLicense:Apache-2.0Stargazers:4343Issues:0Issues:0

Awesome-Chart-Understanding

A curated list of recent and past chart understanding work based on our survey paper: From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models.

Stargazers:118Issues:0Issues:0

YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection

Language:PythonLicense:GPL-3.0Stargazers:4043Issues:0Issues:0