Beast code in Giters

helloheshee's starred repositories

MixTeX-Latex-OCR

MixTeX multimodal LaTeX, ZhEn, and, Table OCR. It performs efficient CPU-based inference in a local offline on Windows.

Language:PythonAGPL-3.045300

bark

🔊 Text-Prompted Generative Audio Model

Language:Jupyter NotebookMIT3497200

Lumina-mGPT

Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"

Language:Python40300

AutoGGUF

automatically quant GGUF models

Language:PythonApache-2.010800

BEVFormer

[ECCV 2022] This is the official implementation of BEVFormer, a camera-only framework for autonomous driving perception, e.g., 3D object detection and semantic map segmentation.

Language:PythonApache-2.0317000

Deformable-DETR

Deformable DETR: Deformable Transformers for End-to-End Object Detection.

Language:PythonApache-2.0310400

TAPTR

[ECCV 2024] Official implementation of the paper "TAPTR: Tracking Any Point with Transformers as Detection"

Language:PythonNOASSERTION17200

T-Rex

[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Language:PythonNOASSERTION209100

refiners

A microframework on top of PyTorch with first-class citizen APIs for foundation model adaptation

Language:PythonMIT38400

CogVideo

Text-to-video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Language:PythonApache-2.0576500

Deep-Live-Cam

real time face swap and one-click video deepfake with only a single image

Language:PythonAGPL-3.02672200

fire-detection-cnn

real-time fire detection in video imagery using a convolutional neural network (deep learning) - from our ICIP 2018 paper (Dunnings / Breckon) + ICMLA 2019 paper (Samarth / Bhowmik / Breckon)

Language:PythonMIT53200

motionshop

Project page of replacing the human motion in the video with a virtual 3D human

37700

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.

Language:TypeScriptNOASSERTION4251000

SimpleTuner

A general fine-tuning kit geared toward diffusion models.

Language:PythonAGPL-3.0131500

unsloth

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory

Language:PythonApache-2.01454300

flux

Official inference repo for FLUX.1 models

Language:PythonApache-2.01053500

metahuman-stream

Real time interactive streaming digital human

Language:PythonApache-2.0307500

vq-vae-2-pytorch

Implementation of Generating Diverse High-Fidelity Images with VQ-VAE-2 in PyTorch

Language:PythonNOASSERTION157700

MindSearch

🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)

Language:PythonApache-2.0389400

1d-tokenizer

This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation

Language:Jupyter NotebookApache-2.035500

MinerU

A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具，支持PDF/网页/多格式电子书提取。

Language:PythonAGPL-3.0914300

Monocular-Visual-Odometry

A simple monocular visual odometry (part of vSLAM) by ORB keypoints with initialization, tracking, local map and bundle adjustment. (WARNING: Hi, I'm sorry that this project is tuned for course demo, not for real world applications !!!)

Language:C++MIT38600