蓋瑞王's repositories

WavTokenizer

SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling

License:MITStargazers:0Issues:0Issues:0

doctr

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

License:Apache-2.0Stargazers:1Issues:0Issues:0

PPOCRLabel

PPOCRLabelv2 is a semi-automatic graphic annotation tool suitable for OCR field, with built-in PP-OCR model to automatically detect and re-recognize data.

Stargazers:0Issues:0Issues:0

notebooks

Examples and tutorials on using SOTA computer vision models and techniques. Learn everything from old-school ResNet, through YOLO and object-detection transformers like DETR, to the latest models like Grounding DINO and SAM.

Language:Jupyter NotebookStargazers:0Issues:0Issues:0

cvat

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.

License:MITStargazers:0Issues:0Issues:0

ultralytics

NEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite

License:AGPL-3.0Stargazers:0Issues:0Issues:0

Deep-Live-Cam

real time face swap and one-click video deepfake with only a single image

License:AGPL-3.0Stargazers:0Issues:0Issues:0

facefusion

Next generation face swapper and enhancer

License:NOASSERTIONStargazers:0Issues:0Issues:0

PeriodWave

The official Implementation of PeriodWave and PeriodWave-Turbo

License:MITStargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

insightface

State-of-the-art 2D and 3D Face Analysis Project

Stargazers:0Issues:0Issues:0

segment-anything-2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:0Issues:0Issues:0

LongWriter

LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs

License:Apache-2.0Stargazers:0Issues:0Issues:0

CogVideo

Text-to-video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

License:Apache-2.0Stargazers:1Issues:0Issues:0

mPLUG-Owl

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family

License:MITStargazers:0Issues:0Issues:0

AI-Scientist

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑‍🔬

License:Apache-2.0Stargazers:0Issues:0Issues:0

FruitNeRF

[IROS24] Offical Code for "FruitNeRF: A Unified Neural Radiance Field based Fruit Counting Framework" - Inegrated into Nerfstudio

Stargazers:0Issues:0Issues:0

openchat

OpenChat: Advancing Open-source Language Models with Imperfect Data

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

sprite-decompose

Fast Sprite Decomposition from Animated Graphics [ECCV2024]

License:MITStargazers:0Issues:0Issues:0

RAGFoundry

Framework for specializing LLMs for retrieval-augmented-generation tasks using fine-tuning.

License:Apache-2.0Stargazers:0Issues:0Issues:0
License:Apache-2.0Stargazers:0Issues:0Issues:0

Medical-SAM2

Medical SAM 2: Segment Medical Images As Video Via Segment Anything Model 2

License:Apache-2.0Stargazers:0Issues:0Issues:0

airllm

AirLLM 70B inference with single 4GB GPU

License:Apache-2.0Stargazers:0Issues:0Issues:0

pytorch3d

PyTorch3D is FAIR's library of reusable components for deep learning with 3D data

License:NOASSERTIONStargazers:0Issues:0Issues:0

FLAME-Universe

Summary of publicly available ressources such as code, datasets, and scientific papers for the FLAME 3D head model

Stargazers:0Issues:0Issues:0

axlearn

An Extensible Deep Learning Library

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

ovavss

Official Implementation of "Open-Vocabulary Audio-Visual Semantic Segmentation" [ACM MM 2024 Oral].

Stargazers:0Issues:0Issues:0

diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

License:Apache-2.0Stargazers:0Issues:0Issues:0

MindSearch

a LLM-based Multi-agent Framework of Web Search Engine similar to Perplexity.ai Pro and SearchGPT

License:Apache-2.0Stargazers:0Issues:0Issues:0