Varun Ganjigunte Prakash's starred repositories
2024-ICLR-Norton
Multi-granularity Correspondence Learning from Long-term Noisy Videos [ICLR 2024, Oral]
Awesome-MLLM-Hallucination
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
EmotionCLIP
[CVPR 2023] Code for "Learning Emotion Representations from Verbal and Nonverbal Communication"
furuta_pendulum
LQR, MPC and DRL approaches to control the Furuta pendulum.
roomac_ros
ROS packages for roomac autonomous mobile manipulation robot
PySceneDetect
:movie_camera: Python and OpenCV-based scene cut/transition detection program & library.
LLaMA-Factory
Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)
Real-Time-Sound-Event-Detection
This repository contains the python implementation of a Sound Event Detection systems working in real time.
PromptingWhisper
Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation
Caption-Anything
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/spaces/VIPLab/Caption-Anything
Awesome_Multimodel_LLM
Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context learning, visual reasoning, foundational models, and more. Stay updated with the latest advancement.