by2101

followers

following

stars

Ye Bai's starred repositories

int_fastdiv

Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.

Language:Cuda7000

CogVLM2

GPT4V-level open-source multi-modal model based on Llama3-8B

Language:PythonApache-2.0161100

ReazonSpeech

Massive open Japanese speech corpus

Language:PythonApache-2.020600

LLMBook-zh.github.io

《大语言模型》作者：赵鑫，李军毅，周昆，唐天一，文继荣

Emotional-Speech-Data

This is the GitHub page for publicly available emotional speech data.

MIT30900

conditional-flow-matching

Language:PythonMIT1900

everyone-can-use-english

人人都能用英语

Language:TypeScriptMPL-2.02196400

datacomp

DataComp: In search of the next generation of multimodal datasets

Language:PythonNOASSERTION61200

HunyuanDiT

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Language:PythonNOASSERTION282000

reka-vibe-eval

Multimodal language model benchmark, featuring challenging examples

Language:PythonApache-2.013900

DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

MIT306700

forcealign

ForceAlign is a Python library for forced alignment of English text to English audio. You can use ForceAlign to get word or phoneme level text alignments of audio, with each word or phoneme's start and end time within the audio. ForceAlign was designed to be easy to install and use, without requiring any third-party, non-Python dependencies.

Language:PythonMIT800

MERTools

Toolkits for Multimodal Emotion Recognition

Language:Python13600

simd

SIMD demo

Language:CApache-2.0100

VGGSound

VGGSound: A Large-scale Audio-Visual Dataset

Language:PythonNOASSERTION27700

RLHF-V

[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

Language:Python19600

SecurityInterviewGuide

网络信息安全从业者面试指南

GPL-3.0135800

Codec-SUPERB

Audio Codec Speech processing Universal PERformance Benchmark

Language:Python18700

Awesome_Modern_Hopfield_Networks

Paper list for Modern Hopfield Networks

500

InfiniTransformer

Unofficial PyTorch/🤗Transformers(Gemma/Llama3) implementation of Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Language:PythonMIT31000

aac-metrics

Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.

Language:PythonMIT3300

MiniCPM

MiniCPM-2B: An end-side LLM outperforming Llama2-13B.

Language:PythonApache-2.0444700

DBPNet

DBPNet model

Language:Python2800

llama.cpp

LLM inference in C/C++

Language:C++MIT200

gazelle

Joint speech-language model - respond directly to audio!

Language:PythonApache-2.029400

MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Language:PythonApache-2.0310200

qwen.cpp

C++ implementation of Qwen-LM

Language:C++NOASSERTION51200

megalodon

Reference implementation of Megalodon 7B model

Language:CudaMIT49700

panns_inference

Language:PythonMIT18600

LinearAttentionArena

Here we will test various linear attention designs.

Language:PythonApache-2.05200