kkahatapitiya

Kumara Kahatapitiya's starred repositories

llama

Inference code for Llama models

Language:PythonNOASSERTION55581 519 959

FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Language:PythonApache-2.036519 348 1768

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.027234 224 4545

Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Language:Jupyter NotebookApache-2.014811 114 385

magic-animate

[CVPR 2024] MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model

Language:PythonBSD-3-Clause10371 104 146

StableCascade

Official Code for Stable Cascade

Language:Jupyter NotebookMIT6516 61 121

LLaMA-Adapter

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

Language:PythonGPL-3.05698 78 142

VideoCrafter

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

Language:PythonNOASSERTION4477 71 82

Awesome-Video-Diffusion

A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.

3185 131 18

Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Language:PythonBSD-3-Clause2722 32 156

spconv

Spatial Sparse Convolution Library

Language:PythonApache-2.01842 24 690

FeatUp

Official code for "FeatUp: A Model-Agnostic Frameworkfor Features at Any Resolution" ICLR 2024

Language:Jupyter NotebookMIT1336 18 63

Awesome-LLMs-for-Video-Understanding

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.

1292 37 4

MotionCtrl

Official Code for MotionCtrl [SIGGRAPH 2024]

Language:PythonApache-2.01273 50 31

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

Language:PythonCC-BY-4.01161 14 119

visual_anagrams

Code for the paper "Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models"

Language:Jupyter NotebookMIT835 9 13

sige

[NeurIPS 2022, T-PAMI 2023] Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models

Language:PythonNOASSERTION257 5 2

X3D-Multigrid

PyTorch implementation of X3D models with Multigrid training.

Language:PythonMIT92 2 12

LLoVi

Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"

Language:PythonMIT81 6 6

EgoSchema

Language:Python64 1 20

Coarse-Fine-Networks

Code for our CVPR 2021 paper "Coarse-Fine Networks for Temporal Activity Detection in Videos"

Language:PythonMIT55 2 11

crossway_diffusion

The official code of our ICRA'24 paper Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning

Language:PythonMIT51 2 7

LangRepo

Language Repository for Long Video Understanding

Language:PythonMIT27 2 1

mvu

Multimodal Video Understanding Framework (MVU)

Language:PythonMIT22 20

lifelong-memory

Code for LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos

Language:PythonMIT13 2 1

LinearConv

Code for our WACV 2021 paper "Exploiting the Redundancy in Convolutional Filters for Parameter Reduction"

Language:PythonMIT9 2 3

SSDet

Code for our AAAI 2023 paper "Weakly-guided Self-supervised Pretraining for Temporal Activity Detection"

Language:PythonMIT9 10

open_x_pytorch_dataloader

An unofficial pytorch dataloader for Open X-Embodiment Datasets https://github.com/google-deepmind/open_x_embodiment

Language:PythonMIT7 20

Grafting-Vision-Transformer

4 10

SWAT

Code for our IJCAI 2023 paper "SWAT: Spatial Structure Within and Among Tokens"

Language:PythonMIT3 10