xuanhan863

ChatGPT CLI is an advanced command-line interface for ChatGPT models via OpenAI and Azure, offering streaming, query mode, and history tracking for seamless, context-aware conversations. Ideal for both users and developers, it provides advanced configuration and easy setup options to ensure a tailored conversational experience with the GPT model.

Language:GoMIT415 10 32

Glyph-ByT5

[ECCV2024] This is an official inference code of the paper "Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering" and "Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering""

Language:Jupyter NotebookApache-2.0413 17 15

T-GATE

T-GATE: Temporally Gating Attention to Accelerate Diffusion Model for Free!

Language:PythonMIT318 12 14

gazelle

Joint speech-language model - respond directly to audio!

Language:PythonApache-2.0293 12 1

radient

Radient turns many data types (not just text) into vectors for similarity search, RAG, regression analysis, and more.

Language:PythonBSD-2-Clause240 4 1

mmdit

Implementation of a single layer of the MMDiT, proposed in Stable Diffusion 3, in Pytorch

Language:PythonMIT200 3 1

RIVAL

[NeurIPS 2023 Spotlight] Real-World Image Variation by Aligning Diffusion Inversion Chain

Language:PythonApache-2.0142 17 8

VoiceLDM

VoiceLDM: Text-to-Speech with Environmental Context

Language:PythonApache-2.0136 7 4

gezgin

Modern Pathfinding Using OpenStreetMap Data with Raylib

Language:C++WTFPL132 10

FAcodec

Training code for FAcodec presented in NaturalSpeech3

Language:Python122 9 12

MuLan

MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)

Language:Python111 3 4

stream-vc

An unofficial PyTorch implementation of the StreamVC(Real-Time Low-Latency Voice Conversion)

Language:Python8700

whisper-acft

Language:Jupyter NotebookMIT62 12 5

ClickDiffusion

ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing

Language:PythonMIT61 2 2

nvImageCodec

A nvImageCodec library of GPU- and CPU- accelerated codecs featuring a unified interface

Language:C++Apache-2.054 10 7

X-Oscar

About Official repository for "X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation"

Language:Python4600

TurboT5

Truly flash T5 realization!

Language:Python35 2 3

LoopGaussian

Language:PythonMIT2400

Spatial-AST

🦇 Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)

Language:PythonNOASSERTION2300

DAC-JAX

A JAX Implementation of the Descript Audio Codec

Language:PythonMIT17 20

PDM-Pure

PDM-based Purifier

Language:Python11 3 3

forcealign

ForceAlign is a Python library for forced alignment of English text to English audio. You can use ForceAlign to get word or phoneme level text alignments of audio, with each word or phoneme's start and end time within the audio. ForceAlign was designed to be easy to install and use, without requiring any third-party, non-Python dependencies.

Language:PythonMIT800