NielsRogge

NielsRogge's repositories

Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.

Language:Jupyter NotebookMIT9119 136 443

transformers

🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

Language:PythonApache-2.042 40

diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch

Language:PythonApache-2.02 20

huggingface.js

Utilities to use the Hugging Face Hub API

Language:TypeScriptMIT200

MeshAnythingV2

From anything to mesh like human artists. Official impl. of "MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization"

Language:PythonMIT200

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Language:Jupyter NotebookApache-2.0200

Show-o

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

MIT200

GST

Official implementation of "GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers"

BSD-3-Clause100

ultralytics

NEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite

Language:PythonAGPL-3.0100

VidGen

100

1d-tokenizer

This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation

Apache-2.0000

AiM

Official PyTorch Implementation of "Scalable Autoregressive Image Generation with Mamba"

MIT000

Apollo

Music repair method to convert lossy MP3 compressed music to lossless music.

000

co-tracker

CoTracker is a model for tracking any point (pixel) on a video.

NOASSERTION000

CoMAE

[AAAI 2023 Oral] CoMAE: Single Model Hybrid Pre-training on Small-Scale RGB-D Datasets

000

count_token_optimization

MIT000

CounTR

CounTR: Transformer-based Generalised Visual Counting

MIT000

CSD

MIT000

doubletake

[ECCV 2024] DoubleTake: Geometry Guided Depth Estimation

NOASSERTION000

EMA-VFI

[CVPR 2023] Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolatio

Apache-2.0000

FluxMusic

Text-to-Music Generation with Rectified Flow Transformers

NOASSERTION000

lerobot

🤗 LeRobot: End-to-end Learning for Real-World Robotics in Pytorch

Apache-2.0000

LeYOLO

Language:PythonAGPL-3.0000

mini-omni

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

MIT000