Beast code in Giters

Varun Ganjigunte Prakash's starred repositories

MMA-DFER

This repository provides an official implementation for the paper MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild.

Language:Python700

2024-ICLR-Norton

Multi-granularity Correspondence Learning from Long-term Noisy Videos [ICLR 2024, Oral]

Language:PythonApache-2.010500

TimeChat

[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

Language:PythonBSD-3-Clause25800

Awesome-MLLM-Hallucination

📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).

33500

MyVLM

Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)

Language:PythonNOASSERTION13600

MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models

Language:PythonApache-2.0189100

EmotionCLIP

[CVPR 2023] Code for "Learning Emotion Representations from Verbal and Nonverbal Communication"

Language:PythonMIT3300

PLLaVA

Official repository for the paper PLLaVA

Language:Python53300

furuta_pendulum

LQR, MPC and DRL approaches to control the Furuta pendulum.

Language:Jupyter NotebookGPL-3.03600

roomac_ros

ROS packages for roomac autonomous mobile manipulation robot

Language:PythonGPL-3.03100

T3AL

Official Pytorch implementation of "Test-Time Zero-Shot Temporal Action Localization", CVPR 2024

Language:Python3800

MA-LMM

(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Language:PythonMIT20800

PySceneDetect

:movie_camera: Python and OpenCV-based scene cut/transition detection program & library.

Language:PythonBSD-3-Clause309200

LLaMA-Factory

Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)

Language:PythonApache-2.02969100

Groma

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

Language:PythonApache-2.052500

laughter

Learning embeddings for laughter categorization

Language:Python3400

portaudio

PortAudio is a cross-platform, open-source C language library for real-time audio input and output.

Language:CNOASSERTION141800

ONE-PEACE

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Language:PythonApache-2.092000

Real-Time-Sound-Event-Detection

This repository contains the python implementation of a Sound Event Detection systems working in real time.

Language:Python4100

PromptingWhisper

Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation

Language:Python13100

llama3

The official Meta Llama 3 GitHub site

Language:PythonNOASSERTION2577700

Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/spaces/VIPLab/Caption-Anything

Language:PythonBSD-3-Clause165500

MemGPT

Create LLM agents with long-term memory and custom tools 📚🦙

Language:PythonApache-2.01125900

PCA-EVAL

[ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain

Language:Jupyter Notebook9800

Awesome_Multimodel_LLM

Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context learning, visual reasoning, foundational models, and more. Stay updated with the latest advancement.

23900

Varun-GP