nahidalam's repositories
MobiLlama
MobiLlama : Small Language Model tailored for edge devices
latent-scope
A scientific instrument for investigating latent spaces
jepa
PyTorch code and models for V-JEPA self-supervised learning from video.
Awesome-LLMs-for-Video-Understanding
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
MiniGPT-4
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
torchdistill
A coding-free framework built on PyTorch for reproducible deep learning studies. 🏆22 knowledge distillation methods presented at CVPR, ICLR, ECCV, NeurIPS, ICCV, etc are implemented so far. 🎁 Trained models, training logs and configurations are available for ensuring the reproducibiliy and benchmark.
LURE
[ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
vstar
PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"
awesome-ml
Curated list of useful LLM / Analytics / Datascience resources
llm-course
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
generative-ai-for-beginners
12 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/
Video-LLaMA
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
gpt4-vision-plugin
Chat with your images using GPT-4 Vision!
Awesome-Foundation-Models
A curated list of foundation models for vision and language tasks
Otter
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
InstructDiffusion
PyTorch implementation of InstructDiffusion, a unifying and generic framework for aligning computer vision tasks with human instructions.
Awesome-Optical-Flow
This is a list of awesome paper about optical flow and related work.
llm-finetune
LLM Finetune
WoodScape
The repository containing tools and information about the WoodScape dataset.
meru
Code for the paper "Hyperbolic Image-Text Representations", Desai et al, ICML 2023
DeepCamera
Open-Source AI Camera. Empower any camera/CCTV with state-of-the-art AI, including facial recognition, person recognition(RE-ID) car detection, fall detection and more
heim
Holistic Evaluation of Text-to-Image Models (HEIM), a fork of HELM to evaluate to text-to-image models (paper coming soon).
GIST-image-text-fine-grained
Generating Image-Specific Text for Fine-grained Object Classification
lightly
A python library for self-supervised learning on images.
awesome-self-supervised-multimodal-learning
A curated list of self-supervised multimodal learning resources.