NielsRogge's repositories
Transformers-Tutorials
This repository contains demos I made with the Transformers library by HuggingFace.
transformers
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
DocLayout-YOLO
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
huggingface.js
Utilities to use the Hugging Face Hub API
segment-anything-2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
clip_dinoiser
Official implementation of 'CLIP-DINOiser: Teaching CLIP a few DINO tricks' paper.
ultralytics
NEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite
AiM
Official PyTorch Implementation of "Scalable Autoregressive Image Generation with Mamba"
Apollo
Music repair method to convert lossy MP3 compressed music to lossless music.
chat-ui
Open source codebase powering the HuggingChat app
CoMAE
[AAAI 2023 Oral] CoMAE: Single Model Hybrid Pre-training on Small-Scale RGB-D Datasets
CounTR
CounTR: Transformer-based Generalised Visual Counting
doubletake
[ECCV 2024] DoubleTake: Geometry Guided Depth Estimation
EMA-VFI
[CVPR 2023] Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolatio
FluxMusic
Text-to-Music Generation with Rectified Flow Transformers
GenerateCT
ECCV 2024 & GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes
LightenDiffusion
Official pytorch implementation for "LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models"
Lotus
Official Implementation of Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction
mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
PGTFormer
[IJCAI'24] Beyond Alignment: Blind Video Face Restoration via Parsing-Guided Temporal-Coherent Transformer
shic
Official implementation of the 2024 ECCV paper SHIC: Shape-Image Correspondences with no Keypoint Annotation
sos-bench
This codebase stores the complete artifacts and describes how to reproduce or extend the results from the paper "Style over Substance: Failure modes of LLM judges in alignment benchmarking", including the MisMo-Bench meta-benchmark.
StreamingT2V
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
VFIMamba
VFIMamba: Video Frame Interpolation with State Space Models