srijandas07

Srijan Das's starred repositories

Awesome-Transformer-Attention

An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites

4489 124 30

pytorchvideo

A deep learning library for video understanding research.

Language:PythonApache-2.03250 160 180

ffcv

FFCV: Fast Forward Computer Vision (and other ML workloads!)

Language:PythonApache-2.02812 20 276

TimeSformer

The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"

Language:PythonNOASSERTION1496 28 128

VideoMAE

[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

Language:PythonNOASSERTION1296 16 119

ICCV 2023 Papers: Discover cutting-edge research from ICCV 2023, the leading computer vision conference. Stay updated on the latest in computer vision and deep learning, with code included. ⭐ support visual intelligence development!

Language:PythonMIT908 13 10

MultiMAE

MultiMAE: Multi-modal Multi-task Masked Autoencoders, ECCV 2022

Language:PythonNOASSERTION535 13 33

Ego4d

Ego4d dataset repository. Download the dataset, visualize, extract features & example usage of the dataset

Language:Jupyter NotebookMIT330 23 153

MAE

PyTorch implementation of Masked Autoencoder

Language:PythonMIT206 2 21

SPT_LSA_ViT

Implementation of Visual Transformer for Small-size Datasets

Language:Python115 2 16

VidIL

Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners

Language:PythonMIT110 5 11

imix

ICLR 2021 i-Mix: A Domain-Agnostic Strategy for Contrastive Representation Learning

Language:PythonMIT77 3 8

LIV

Official repository for "LIV: Language-Image Representations and Rewards for Robotic Control" (ICML 2023)

Language:PythonMIT77 3 7

MS-TCT

[CVPR2022] MS-TCT

Language:Python50 2 11

PathLDM

Official Code for PathLDM: Text conditioned Latent Diffusion Model for Histopathology (WACV 2024)

Language:Jupyter Notebook26 7 24

Limited-data-vits

[WACV 2024] Code for "Limited Data, Unlimited Potential: A Study on ViTs Augmented by Masked Autoencoders"

Language:Python22 2 3

PoseAwareVT

Code for the paper Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision Transformers

Language:Python19 3 2

SI-MIL

Language:Python1800

3DTRL

Code for NeurIPS 2022 paper "Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space"

Language:PythonMIT18 6 1

2s-AGCN-For-Daily-Living

2s-AGCN on Smarthome (dataset for daily living)

Language:Python16 3 5

Fibottention

Inceptive Visual Representation Learning with Diverse Attention Across Heads

Language:PythonCC-BY-4.01400

LLAVIDAL

This is the offical repository of LLAVIDAL

Language:PythonCC-BY-4.012 1 2

pi-vit

[CVPR 2024] Code and models for pi-ViT, a video transformer for understanding activities of daily living

Language:PythonNOASSERTION10 30

Toyota_Smarthome

Tools for Toyota Smarthome datasets

Language:Python8 1 1

mavrec-code

This code is provided for reproducibility of results in the paper: Multiview Aerial Visual Recognition (MAVREC): Can Multi-view Improve Aerial Visual Perception?

Language:Jupyter NotebookCC-BY-4.07 10

FreqMixFormer

[ACM MM 2024] Frequency Guidance Matters: Skeletal Action Recognition by Frequency-Aware Mixed Transformer

600

improved_HAR_on_Toyota

Improved action recognition with Separable spatio-temporal attention using alternative Skeletal and Video pre-processing

Language:Python4 30

separable_STA

Implementation of Separable Spatio-temporal attention (STA) netowork

Language:Python2 30

synchronization-is-all-you-need

Synchronization is All You Need: Exocentric-to-Egocentric Transfer for Temporal Action Segmentation with Unlabeled Synchronized Video Pairs [ECCV, 2024]

Language:PythonMIT1 4 1

Pyvideoresearch_new

The master PyVideoresearch is committed with few changes in this repository in order to make use of the pre-trained models.

Language:PythonGPL-3.01 2 1