蓋瑞王's repositories
segment-anything-2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
AI-Scientist
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑🔬
airllm
AirLLM 70B inference with single 4GB GPU
axlearn
An Extensible Deep Learning Library
cvat
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
Deep-Live-Cam
real time face swap and one-click video deepfake with only a single image
diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
facefusion
Next generation face swapper and enhancer
FLAME-Universe
Summary of publicly available ressources such as code, datasets, and scientific papers for the FLAME 3D head model
FruitNeRF
[IROS24] Offical Code for "FruitNeRF: A Unified Neural Radiance Field based Fruit Counting Framework" - Inegrated into Nerfstudio
insightface
State-of-the-art 2D and 3D Face Analysis Project
LongWriter
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
Medical-SAM2
Medical SAM 2: Segment Medical Images As Video Via Segment Anything Model 2
MindSearch
a LLM-based Multi-agent Framework of Web Search Engine similar to Perplexity.ai Pro and SearchGPT
mPLUG-Owl
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
notebooks
Examples and tutorials on using SOTA computer vision models and techniques. Learn everything from old-school ResNet, through YOLO and object-detection transformers like DETR, to the latest models like Grounding DINO and SAM.
ovavss
Official Implementation of "Open-Vocabulary Audio-Visual Semantic Segmentation" [ACM MM 2024 Oral].
PeriodWave
The official Implementation of PeriodWave and PeriodWave-Turbo
PPOCRLabel
PPOCRLabelv2 is a semi-automatic graphic annotation tool suitable for OCR field, with built-in PP-OCR model to automatically detect and re-recognize data.
pytorch3d
PyTorch3D is FAIR's library of reusable components for deep learning with 3D data
RAGFoundry
Framework for specializing LLMs for retrieval-augmented-generation tasks using fine-tuning.
sprite-decompose
Fast Sprite Decomposition from Animated Graphics [ECCV2024]
ultralytics
NEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite
WavTokenizer
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling