Beast code in Giters

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Language:PythonNOASSERTION6026 58 1083

Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

Language:PythonMIT4334 34 325

ollama-python

Ollama Python library

Language:PythonMIT3961 29 143

sapiens

High-resolution models for human tasks.

Language:PythonNOASSERTION3936 41 97

speech-to-speech

Speech To Speech: an effort for an open-sourced and modular GPT4-o

Language:PythonApache-2.03027 35 64

taffy

A high performance rust-powered UI layout library

Language:RustNOASSERTION2066 24 226

YOLOP

You Only Look Once for Panopitic Driving Perception.（MIR2022）

Language:PythonMIT1900 31 198

KalmanFilter

This is a Kalman filter used to calculate the angle, rate and bias from from the input of an accelerometer/magnetometer and a gyroscope.

Language:C++1754 117 23

InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Language:PythonApache-2.01310 28 169

SpeechGPT

SpeechGPT Series: Speech Large Language Models

Language:PythonApache-2.01225 45 43

Show-o

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Language:PythonApache-2.083000

AgentK

An autoagentic AGI that is self-evolving and modular.

Language:PythonMIT822 15 14

VITA

✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM

Language:PythonNOASSERTION769 38 39

AnyGPT

Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"

Language:Python746 21 38

humor

Code for ICCV 2021 paper "HuMoR: 3D Human Motion Model for Robust Pose Estimation"

Language:PythonMIT511 16 50

micrograd

The Autograd Engine

Language:HTML485 9 1

DQN_play_sekiro

Language:PythonMIT436 3 1

NExT-Chat

The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".

Language:PythonApache-2.0204 2 21

LLM101n-CN

LLM101n: Let's build a Storyteller 中文版

Language:C++11300

Flash-VStream

This is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"

Language:PythonApache-2.0105 2 14

COCO-UniHuman

Language:Python1200

ollama

Get up and running with Llama 3.1, Mistral, Gemma 2, and other large language models.

Language:GoMIT300