helloheshee's starred repositories

MixTeX-Latex-OCR

MixTeX multimodal LaTeX, ZhEn, and, Table OCR. It performs efficient CPU-based inference in a local offline on Windows.

Language:PythonLicense:AGPL-3.0Stargazers:453Issues:0Issues:0

bark

🔊 Text-Prompted Generative Audio Model

Language:Jupyter NotebookLicense:MITStargazers:34972Issues:0Issues:0

Lumina-mGPT

Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"

Language:PythonStargazers:403Issues:0Issues:0

AutoGGUF

automatically quant GGUF models

Language:PythonLicense:Apache-2.0Stargazers:108Issues:0Issues:0

BEVFormer

[ECCV 2022] This is the official implementation of BEVFormer, a camera-only framework for autonomous driving perception, e.g., 3D object detection and semantic map segmentation.

Language:PythonLicense:Apache-2.0Stargazers:3170Issues:0Issues:0

Deformable-DETR

Deformable DETR: Deformable Transformers for End-to-End Object Detection.

Language:PythonLicense:Apache-2.0Stargazers:3104Issues:0Issues:0

TAPTR

[ECCV 2024] Official implementation of the paper "TAPTR: Tracking Any Point with Transformers as Detection"

Language:PythonLicense:NOASSERTIONStargazers:172Issues:0Issues:0

T-Rex

[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Language:PythonLicense:NOASSERTIONStargazers:2091Issues:0Issues:0

refiners

A microframework on top of PyTorch with first-class citizen APIs for foundation model adaptation

Language:PythonLicense:MITStargazers:384Issues:0Issues:0

CogVideo

Text-to-video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Language:PythonLicense:Apache-2.0Stargazers:5765Issues:0Issues:0

Deep-Live-Cam

real time face swap and one-click video deepfake with only a single image

Language:PythonLicense:AGPL-3.0Stargazers:26722Issues:0Issues:0

fire-detection-cnn

real-time fire detection in video imagery using a convolutional neural network (deep learning) - from our ICIP 2018 paper (Dunnings / Breckon) + ICMLA 2019 paper (Samarth / Bhowmik / Breckon)

Language:PythonLicense:MITStargazers:532Issues:0Issues:0

motionshop

Project page of replacing the human motion in the video with a virtual 3D human

Stargazers:377Issues:0Issues:0

dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.

Language:TypeScriptLicense:NOASSERTIONStargazers:42510Issues:0Issues:0

SimpleTuner

A general fine-tuning kit geared toward diffusion models.

Language:PythonLicense:AGPL-3.0Stargazers:1315Issues:0Issues:0

unsloth

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory

Language:PythonLicense:Apache-2.0Stargazers:14543Issues:0Issues:0

flux

Official inference repo for FLUX.1 models

Language:PythonLicense:Apache-2.0Stargazers:10535Issues:0Issues:0

metahuman-stream

Real time interactive streaming digital human

Language:PythonLicense:Apache-2.0Stargazers:3075Issues:0Issues:0

vq-vae-2-pytorch

Implementation of Generating Diverse High-Fidelity Images with VQ-VAE-2 in PyTorch

Language:PythonLicense:NOASSERTIONStargazers:1577Issues:0Issues:0

MindSearch

🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)

Language:PythonLicense:Apache-2.0Stargazers:3894Issues:0Issues:0

1d-tokenizer

This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:355Issues:0Issues:0

MinerU

A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。

Language:PythonLicense:AGPL-3.0Stargazers:9143Issues:0Issues:0

Monocular-Visual-Odometry

A simple monocular visual odometry (part of vSLAM) by ORB keypoints with initialization, tracking, local map and bundle adjustment. (WARNING: Hi, I'm sorry that this project is tuned for course demo, not for real world applications !!!)

Language:C++License:MITStargazers:386Issues:0Issues:0

tiktoken

tiktoken is a fast BPE tokeniser for use with OpenAI's models.

Language:PythonLicense:MITStargazers:11551Issues:0Issues:0

lmms-finetune

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, qwen-vl, phi3-v etc.

Language:PythonLicense:Apache-2.0Stargazers:103Issues:0Issues:0

LabelLLM

The Open-Source Data Annotation Platform

Language:TypeScriptLicense:Apache-2.0Stargazers:435Issues:0Issues:0

Open-MAGVIT2

Open-MAGVIT2: Democratizing Autoregressive Visual Generation

Language:PythonLicense:Apache-2.0Stargazers:376Issues:0Issues:0

Stable-Hair

Stable-Hair: Real-World Hair Transfer via Diffusion Model

License:Apache-2.0Stargazers:306Issues:0Issues:0

lz4

Extremely Fast Compression algorithm

Language:CLicense:NOASSERTIONStargazers:10157Issues:0Issues:0

LongRoPE

LongRoPE is a novel method that can extends the context window of pre-trained LLMs to an impressive 2048k tokens.

Language:PythonLicense:MITStargazers:58Issues:0Issues:0