Roman Gudchenko (roma-goodok)

roma-goodok

Geek Repo

Company:Align Technology

Location:Moscow

Github PK Tool:Github PK Tool

Roman Gudchenko's starred repositories

fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Language:PythonLicense:MITStargazers:30737Issues:426Issues:4211

Awesome-LLM

Awesome-LLM: a curated list of Large Language Model

LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

Language:Jupyter NotebookLicense:BSD-3-ClauseStargazers:10105Issues:97Issues:676

CogVLM

a state-of-the-art-level open visual language model | 多模态预训练模型

Language:PythonLicense:Apache-2.0Stargazers:6223Issues:68Issues:432

MobileSAM

This is the official code for MobileSAM project that makes SAM lightweight for mobile applications and beyond!

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:4918Issues:42Issues:125

Semantic-SAM

[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"

VMamba

VMamba: Visual State Space Models,code is based on mamba

Language:PythonLicense:MITStargazers:2309Issues:17Issues:342

LLaVA-Med

Large Language-and-Vision Assistant for Biomedicine, built towards multimodal GPT-4 level capabilities.

Language:PythonLicense:NOASSERTIONStargazers:1649Issues:25Issues:90

CMC

[arXiv 2019] "Contrastive Multiview Coding", also contains implementations for MoCo and InstDis

Language:PythonLicense:BSD-2-ClauseStargazers:1311Issues:28Issues:70

MaskDINO

[CVPR 2023] Official implementation of the paper "Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation"

Language:PythonLicense:Apache-2.0Stargazers:1238Issues:35Issues:113

SimMIM

This is an official implementation for "SimMIM: A Simple Framework for Masked Image Modeling".

Language:PythonLicense:MITStargazers:936Issues:23Issues:42

PointTransformerV3

[CVPR'24 Oral] Official repository of Point Transformer V3 (PTv3)

Language:PythonLicense:MITStargazers:880Issues:14Issues:124

objaverse-xl

🪐 Objaverse-XL is a Universe of 10M+ 3D Objects. Contains API Scripts for Downloading and Processing!

Language:PythonLicense:Apache-2.0Stargazers:815Issues:10Issues:54

OpenSeeD

[ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"

Language:PythonLicense:Apache-2.0Stargazers:675Issues:22Issues:39

UniMatch

[CVPR 2023] Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation

Language:PythonLicense:MITStargazers:490Issues:3Issues:118

maxvit

[ECCV 2022] Official repository for "MaxViT: Multi-Axis Vision Transformer". SOTA foundation models for classification, detection, segmentation, image quality, and generative modeling...

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:453Issues:9Issues:20

DSVT

[CVPR2023] Official Implementation of "DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets"

Language:PythonLicense:Apache-2.0Stargazers:392Issues:8Issues:78

oneformer3d

[CVPR2024] OneFormer3D: One Transformer for Unified Point Cloud Segmentation

Language:PythonLicense:NOASSERTIONStargazers:369Issues:9Issues:99

VOTR

Voxel Transformer for 3D object detection

Swin3D

A shift-window based transformer for 3D sparse tasks

Language:CudaLicense:MITStargazers:223Issues:9Issues:30

multi_token

Embed arbitrary modalities (images, audio, documents, etc) into large language models.

Language:PythonLicense:Apache-2.0Stargazers:175Issues:3Issues:25

jetson-intro-to-distillation

A tutorial introducing knowledge distillation as an optimization technique for deployment on NVIDIA Jetson

Language:PythonLicense:NOASSERTIONStargazers:160Issues:4Issues:2

CoolGraph

Make GNN easy to start with

Language:Jupyter NotebookLicense:MITStargazers:126Issues:5Issues:1

semivl

[ECCV'24] Official Implementation of SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance

Language:PythonLicense:Apache-2.0Stargazers:117Issues:5Issues:11

M3I-Pretraining

[CVPR 2023] implementation of Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information.

multi-view-AE

Multi-view-AE: An extensive collection of multi-modal autoencoders implemented in a modular, scikit-learn style framework.

Language:PythonLicense:MITStargazers:46Issues:3Issues:16