NielsRogge's repositories
Transformers-Tutorials
This repository contains demos I made with the Transformers library by HuggingFace.
transformers
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
huggingface.js
Utilities to use the Hugging Face Hub API
MeshAnythingV2
From anything to mesh like human artists. Official impl. of "MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization"
segment-anything-2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
ultralytics
NEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite
1d-tokenizer
This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation
AiM
Official PyTorch Implementation of "Scalable Autoregressive Image Generation with Mamba"
Apollo
Music repair method to convert lossy MP3 compressed music to lossless music.
co-tracker
CoTracker is a model for tracking any point (pixel) on a video.
CoMAE
[AAAI 2023 Oral] CoMAE: Single Model Hybrid Pre-training on Small-Scale RGB-D Datasets
CounTR
CounTR: Transformer-based Generalised Visual Counting
doubletake
[ECCV 2024] DoubleTake: Geometry Guided Depth Estimation
EMA-VFI
[CVPR 2023] Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolatio
FluxMusic
Text-to-Music Generation with Rectified Flow Transformers
lerobot
🤗 LeRobot: End-to-end Learning for Real-World Robotics in Pytorch
mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
PGTFormer
[IJCAI'24] Beyond Alignment: Blind Video Face Restoration via Parsing-Guided Temporal-Coherent Transformer
silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
StreamingT2V
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
VFIMamba
VFIMamba: Video Frame Interpolation with State Space Models
vggsfm
VGGSfM: Visual Geometry Grounded Deep Structure From Motion