XCJinggai

Fengxi ZHANG's starred repositories

visqol

Perceptual Quality Estimator for speech and audio

Language:C++Apache-2.065300

trl

Train transformer language models with reinforcement learning.

Language:PythonApache-2.0889300

FunCodec

FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.

Language:PythonMIT32300

USLM

Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)

Language:Python12400

1d-tokenizer

This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation

Language:Jupyter NotebookApache-2.029800

vector-quantize-pytorch

Vector (and Scalar) Quantization, in Pytorch

Language:PythonMIT223800

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonApache-2.01841900

zero_nlp

中文nlp解决方案(大模型、数据、模型、训练、推理)

Language:Jupyter NotebookMIT272800

MaMMUT-pytorch

Implementation of MaMMUT, a simple vision-encoder text-decoder architecture for multimodal tasks from Google, in Pytorch

Language:PythonMIT9500

EMPatches

Extract and Merge image patches for easy, fast and self-contained digital image processing and deep learning model training.

Language:Jupyter Notebook4500

patchify.py

A library that helps you split image into small, overlappable patches, and merge patches into original image.

Language:PythonMIT20100

LM4LV

🔥Official PyTorch implementation for "LM4LV: A Frozen Large Language Model for Low-level Vision Tasks".

Language:PythonApache-2.03000

tiktoken

tiktoken is a fast BPE tokeniser for use with OpenAI's models.

Language:PythonMIT1132900

libjxl

JPEG XL image format reference implementation

Language:C++BSD-3-Clause238800

L3C-PyTorch

PyTorch Implementation of the CVPR'19 Paper "Practical Full Resolution Learned Lossless Image Compression"

Language:PythonGPL-3.039200

libbpg-py

a pure python binding for BPG (Better Portable Graphics)

Language:Python2200

imageio-flif

imageio plugin with FLIF wrapper for Python

AGPL-3.0100

pyFLIF

ctypes based python wrapper for FLIF library

Language:PythonLGPL-3.0200

minbpe

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

Language:PythonMIT881600

llama3-from-scratch

llama3 implementation one matrix multiplication at a time

Language:Jupyter NotebookMIT1156900

vit-pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Language:PythonMIT1906200

CompressAI-Vision helps you design, test and compare Video Compression for Machines pipelines. Compression methods can be either pulled from custom AI-based modules from CompressAI or traditional codecs such as H.266/VVC.

Language:PythonBSD-3-Clause-Clear8000

CompressAI

A PyTorch library and evaluation platform for end-to-end compression research

Language:PythonBSD-3-Clause-Clear111500

sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

Language:C++Apache-2.0989200

vision_transformer

Language:Jupyter NotebookApache-2.0982200

SpeechTokenizer

This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Language:PythonApache-2.038000

magvit

Official JAX implementation of MAGVIT: Masked Generative Video Transformer

Language:PythonApache-2.091600

magvit2-pytorch

Implementation of MagViT2 Tokenizer in Pytorch

Language:PythonMIT50200

GLM-4

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型

Language:PythonApache-2.0396200