墨问's starred repositories

Deep-Live-Cam

real time face swap and one-click video deepfake with only a single image

Language:PythonLicense:AGPL-3.0Stargazers:35685Issues:202Issues:426

gpt-2

Code for the paper "Language Models are Unsupervised Multitask Learners"

Language:PythonLicense:NOASSERTIONStargazers:22268Issues:635Issues:265

yoga

Yoga is an embeddable layout engine targeting web standards.

PathFinding.js

A comprehensive path-finding library for grid based games

AI-Scientist

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑‍🔬

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:7541Issues:86Issues:93

FlagEmbedding

Retrieval and Retrieval-augmented LLMs

Language:PythonLicense:MITStargazers:6883Issues:43Issues:984

streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Language:PythonLicense:MITStargazers:6557Issues:63Issues:80

GroundingDINO

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

Language:PythonLicense:Apache-2.0Stargazers:6303Issues:41Issues:296

FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Language:PythonLicense:NOASSERTIONStargazers:6026Issues:58Issues:1083

Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

Language:PythonLicense:MITStargazers:4334Issues:34Issues:325

ollama-python

Ollama Python library

Language:PythonLicense:MITStargazers:3961Issues:29Issues:143

sapiens

High-resolution models for human tasks.

Language:PythonLicense:NOASSERTIONStargazers:3936Issues:41Issues:97

speech-to-speech

Speech To Speech: an effort for an open-sourced and modular GPT4-o

Language:PythonLicense:Apache-2.0Stargazers:3027Issues:35Issues:64

taffy

A high performance rust-powered UI layout library

Language:RustLicense:NOASSERTIONStargazers:2066Issues:24Issues:226

YOLOP

You Only Look Once for Panopitic Driving Perception.(MIR2022)

Language:PythonLicense:MITStargazers:1900Issues:31Issues:198

KalmanFilter

This is a Kalman filter used to calculate the angle, rate and bias from from the input of an accelerometer/magnetometer and a gyroscope.

InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Language:PythonLicense:Apache-2.0Stargazers:1310Issues:28Issues:169

SpeechGPT

SpeechGPT Series: Speech Large Language Models

Language:PythonLicense:Apache-2.0Stargazers:1225Issues:45Issues:43

Show-o

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Language:PythonLicense:Apache-2.0Stargazers:830Issues:0Issues:0

AgentK

An autoagentic AGI that is self-evolving and modular.

Language:PythonLicense:MITStargazers:822Issues:15Issues:14

VITA

✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM

Language:PythonLicense:NOASSERTIONStargazers:769Issues:38Issues:39

AnyGPT

Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"

humor

Code for ICCV 2021 paper "HuMoR: 3D Human Motion Model for Robust Pose Estimation"

Language:PythonLicense:MITStargazers:511Issues:16Issues:50

micrograd

The Autograd Engine

DQN_play_sekiro

DQN_play_sekiro

Language:PythonLicense:MITStargazers:436Issues:3Issues:1

NExT-Chat

The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".

Language:PythonLicense:Apache-2.0Stargazers:204Issues:2Issues:21

LLM101n-CN

LLM101n: Let's build a Storyteller 中文版

Language:C++Stargazers:113Issues:0Issues:0

Flash-VStream

This is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"

Language:PythonLicense:Apache-2.0Stargazers:105Issues:2Issues:14
Language:PythonStargazers:12Issues:0Issues:0

ollama

Get up and running with Llama 3.1, Mistral, Gemma 2, and other large language models.

Language:GoLicense:MITStargazers:3Issues:0Issues:0