Saulo Catharino's repositories
DeepSeek-Coder
DeepSeek Coder: Let the Code Write Itself
DynamiCrafter
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
InternLM-XComposer
InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
DE-COP_Method
This repository presents the original implementation of DE-COP: Detecting Copyrighted Content in Language Models Training Data by André V. Duarte, Xuandong Zhao, Arlindo L. Oliveira and Lei Li
InstructIR
InstructIR: High-Quality Image Restoration Following Human Instructions https://huggingface.co/spaces/marcosv/InstructIR
YOLO-World
Real-Time Open-Vocabulary Object Detection
AnimateLCM
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning
browserless
Deploy headless browsers in Docker. Run on our cloud or bring your own. Free for non-commercial uses.
Fracture_Detection_Improved_YOLOv8
YOLOv8-AM: YOLOv8 with Attention Mechanisms for Pediatric Wrist Fracture Detection
Groma
Grounded Multimodal Large Language Model with Localized Visual Tokenization
IDM-VTON
IDM-VTON : Improving Diffusion Models for Authentic Virtual Try-on in the Wild
InstantMesh
InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models
Mamba-UNet
Mamba-UNet: Unet-like Pure Visual Mamba for Medical Image Segmentation
Metric3D
The repo for "Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image" and "Metric3Dv2: A Versatile Monocular Geometric Foundation Model..."
mickey
[CVPR 2024 - Oral] Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences
MobileAgent
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
NATTEN
Neighborhood Attention Extension. Bringing attention to a neighborhood near you!
OOTDiffusion
Official implementation of OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
SPIN
The official implementation of Self-Play Fine-Tuning (SPIN)
StreamingT2V
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
UAV-Rain1k
UAV-Rain1k: A Benchmark for Raindrop Removal from UAV Aerial Imagery
whisper-asr-webservice
OpenAI Whisper ASR Webservice API