NielsRogge

NielsRogge's repositories

Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.

Language:Jupyter NotebookMIT9302 138 450

transformers

🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

Language:PythonApache-2.044 40

DocLayout-YOLO

DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception

AGPL-3.0200

huggingface.js

Utilities to use the Hugging Face Hub API

Language:TypeScriptMIT200

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Language:Jupyter NotebookApache-2.0200

Show-o

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Language:PythonMIT200

clip_dinoiser

Official implementation of 'CLIP-DINOiser: Teaching CLIP a few DINO tricks' paper.

Apache-2.0100

GST

Official implementation of "GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers"

Language:PythonBSD-3-Clause100

Long-CLIP

[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"

Apache-2.0100

ml-veclip

The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"

NOASSERTION100

ultralytics

NEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite

Language:PythonAGPL-3.0100

AiM

Official PyTorch Implementation of "Scalable Autoregressive Image Generation with Mamba"

MIT000

Apollo

Music repair method to convert lossy MP3 compressed music to lossless music.

000

chat-ui

Open source codebase powering the HuggingChat app

Apache-2.0000

CoMAE

[AAAI 2023 Oral] CoMAE: Single Model Hybrid Pre-training on Small-Scale RGB-D Datasets

Language:Python000

count_token_optimization

Language:PythonMIT000

CounTR

CounTR: Transformer-based Generalised Visual Counting

MIT000

CSD

MIT000

doubletake

[ECCV 2024] DoubleTake: Geometry Guided Depth Estimation

Language:PythonNOASSERTION000

EMA-VFI

[CVPR 2023] Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolatio

Apache-2.0000

FluxMusic

Text-to-Music Generation with Rectified Flow Transformers

NOASSERTION000

GenerateCT

ECCV 2024 & GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes

MIT000

LightenDiffusion

Official pytorch implementation for "LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models"

000

Lotus

Official Implementation of Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction

Apache-2.0000

mini-omni

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

MIT000

PGTFormer

[IJCAI'24] Beyond Alignment: Blind Video Face Restoration via Parsing-Guided Temporal-Coherent Transformer

Language:PythonNOASSERTION000

shic

Official implementation of the 2024 ECCV paper SHIC: Shape-Image Correspondences with no Keypoint Annotation

000

sos-bench

This codebase stores the complete artifacts and describes how to reproduce or extend the results from the paper "Style over Substance: Failure modes of LLM judges in alignment benchmarking", including the MisMo-Bench meta-benchmark.

Apache-2.0000

StreamingT2V

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

Language:Python000

VFIMamba

VFIMamba: Video Frame Interpolation with State Space Models

Language:PythonApache-2.0000