Ye Bai's starred repositories

int_fastdiv

Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.

Language:CudaStargazers:70Issues:0Issues:0

CogVLM2

GPT4V-level open-source multi-modal model based on Llama3-8B

Language:PythonLicense:Apache-2.0Stargazers:1611Issues:0Issues:0

ReazonSpeech

Massive open Japanese speech corpus

Language:PythonLicense:Apache-2.0Stargazers:206Issues:0Issues:0

LLMBook-zh.github.io

《大语言模型》作者:赵鑫,李军毅,周昆,唐天一,文继荣

Stargazers:1939Issues:0Issues:0

Emotional-Speech-Data

This is the GitHub page for publicly available emotional speech data.

License:MITStargazers:309Issues:0Issues:0
Language:PythonLicense:MITStargazers:19Issues:0Issues:0

everyone-can-use-english

人人都能用英语

Language:TypeScriptLicense:MPL-2.0Stargazers:21964Issues:0Issues:0

datacomp

DataComp: In search of the next generation of multimodal datasets

Language:PythonLicense:NOASSERTIONStargazers:612Issues:0Issues:0

HunyuanDiT

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Language:PythonLicense:NOASSERTIONStargazers:2820Issues:0Issues:0

reka-vibe-eval

Multimodal language model benchmark, featuring challenging examples

Language:PythonLicense:Apache-2.0Stargazers:139Issues:0Issues:0

DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

License:MITStargazers:3067Issues:0Issues:0

forcealign

ForceAlign is a Python library for forced alignment of English text to English audio. You can use ForceAlign to get word or phoneme level text alignments of audio, with each word or phoneme's start and end time within the audio. ForceAlign was designed to be easy to install and use, without requiring any third-party, non-Python dependencies.

Language:PythonLicense:MITStargazers:8Issues:0Issues:0

MERTools

Toolkits for Multimodal Emotion Recognition

Language:PythonStargazers:136Issues:0Issues:0

simd

SIMD demo

Language:CLicense:Apache-2.0Stargazers:1Issues:0Issues:0

VGGSound

VGGSound: A Large-scale Audio-Visual Dataset

Language:PythonLicense:NOASSERTIONStargazers:277Issues:0Issues:0

RLHF-V

[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

Language:PythonStargazers:196Issues:0Issues:0

SecurityInterviewGuide

网络信息安全从业者面试指南

License:GPL-3.0Stargazers:1358Issues:0Issues:0

Codec-SUPERB

Audio Codec Speech processing Universal PERformance Benchmark

Language:PythonStargazers:187Issues:0Issues:0

Awesome_Modern_Hopfield_Networks

Paper list for Modern Hopfield Networks

Stargazers:5Issues:0Issues:0

InfiniTransformer

Unofficial PyTorch/🤗Transformers(Gemma/Llama3) implementation of Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Language:PythonLicense:MITStargazers:310Issues:0Issues:0

aac-metrics

Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.

Language:PythonLicense:MITStargazers:33Issues:0Issues:0

MiniCPM

MiniCPM-2B: An end-side LLM outperforming Llama2-13B.

Language:PythonLicense:Apache-2.0Stargazers:4447Issues:0Issues:0

DBPNet

DBPNet model

Language:PythonStargazers:28Issues:0Issues:0

llama.cpp

LLM inference in C/C++

Language:C++License:MITStargazers:2Issues:0Issues:0

gazelle

Joint speech-language model - respond directly to audio!

Language:PythonLicense:Apache-2.0Stargazers:294Issues:0Issues:0

MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Language:PythonLicense:Apache-2.0Stargazers:3102Issues:0Issues:0

qwen.cpp

C++ implementation of Qwen-LM

Language:C++License:NOASSERTIONStargazers:512Issues:0Issues:0

megalodon

Reference implementation of Megalodon 7B model

Language:CudaLicense:MITStargazers:497Issues:0Issues:0
Language:PythonLicense:MITStargazers:186Issues:0Issues:0

LinearAttentionArena

Here we will test various linear attention designs.

Language:PythonLicense:Apache-2.0Stargazers:52Issues:0Issues:0