AKASH2907

Akash Kumar's starred repositories

awesome-video-self-supervised-learning

A curated list of awesome self-supervised learning methods in videos

6200

multimodal_dataset_distillation

Language:Python3200

SeViLA

[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering

Language:PythonBSD-3-Clause15900

InternVL

[CVPR 2024 Oral] InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks —— An Open-Source Alternative to ViT-22B

Language:Jupyter NotebookMIT69100

CM-Erase-REG

Code for CVPR 19 Paper "Improving Referring Expression Grounding with Cross-modal Attention-guided Erasing"

Language:Python3400

visil

Authors official PyTorch implementation of the "ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning" [ICCV 2019]

Language:PythonApache-2.019800

FriendsDontLetFriends

Friends don't let friends make certain types of data visualization - What are they and why are they bad.

Language:RMIT567000

peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Language:PythonApache-2.01369900

MM2021-CO2-Net

Language:Python3700

first-order-model

This repository contains the source code for the paper First Order Motion Model for Image Animation

Language:Jupyter NotebookMIT1419000

STAN

Official PyTorch implementation of the paper "Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring"

Language:PythonApache-2.08100

insightface

State-of-the-art 2D and 3D Face Analysis Project

Language:Python2119800

S3D_HowTo100M

S3D Text-Video model trained on HowTo100M using MIL-NCE

Language:PythonApache-2.018400

tabilize

Simple code for generating a color-coded latex table from raw data

Language:Jupyter Notebook14600

acgcn

Code for the paper "Spot What Matters: Learning Context Using Graph Convolutional Networks for Weakly-Supervised Action Detection"

Language:Python1200

EMA-VFI

[CVPR 2023] Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolatio

Language:PythonApache-2.030800

hiera

Hiera: A fast, powerful, and simple hierarchical vision transformer.

Language:PythonApache-2.069100

Awesome-Referring-Image-Segmentation

:books: A collection of papers about Referring Image Segmentation.

52800

EVAD

[ICCV 2023] Efficient Video Action Detection with Token Dropout and Context Refinement

Language:PythonNOASSERTION1900

MI-AOD

Code for Multiple Instance Active Learning for Object Detection, CVPR 2021

Language:PythonApache-2.032300

CPL

Language:Python1500

NExT-QA

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)

Language:PythonMIT9700

CLIP-Help-SimCLR

Official Code for ICML 2023 Paper: On the Generalization of Multi-modal Contrastive Learning

Language:Python1900

VideoX

VideoX: a collection of video cross-modal models

Language:PythonNOASSERTION92700

frozen-in-time

Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]

Language:PythonMIT33000

singularity

[ACL 2023] Official PyTorch code for Singularity model in "Revealing Single Frame Bias for Video-and-Language Learning"

Language:PythonMIT12400

bertviz

BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)

Language:PythonApache-2.0636000

mil_pytorch

Multiple instance learning model implemented in pytorch

Language:Python2900

xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

Language:PythonNOASSERTION752200

PySceneDetect

:movie_camera: Python and OpenCV-based scene cut/transition detection program & library.

Language:PythonNOASSERTION277100