kyegomez / MC-ViT

Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"

Home Page:https://discord.gg/GYbXvDGevY

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Multi-Modality

MC -VIT

Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"

Install

$ pip install mcvit

Usage

import torch
from mcvit.model import MCViT


# Initialize the MCViT model
mcvit = MCViT(
    dim=512,
    attn_seq_len=256,
    dim_head=64,
    dropout=0.1,
    chunks=16,
    depth=12,
    cross_attn_heads=8,
)

# Create a random tensor to represent a video
x = torch.randn(
    1, 3, 256, 256, 256
)  # (batch, channels, frames, height, width)

# Pass the tensor through the model
output = mcvit(x)

print(
    output.shape
)  # Outputs the shape of the tensor after passing through the model

License

MIT

About

Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"

https://discord.gg/GYbXvDGevY

License:MIT License


Languages

Language:Python 91.2%Language:Shell 8.8%