blip2

There are 2 repositories under blip2 topic.

DAMO-NLP-SG / Video-LLaMA
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
large-language-models video-language-pretraining vision-language-pretraining blip2 llama minigpt4 cross-modal-pretraining multi-modal-chatgpt
Language:Python 3064
sled-group / chat-with-nerf
[ICRA 2024] Chat with NeRF enables users to interact with a NeRF model by typing in natural language.
blip2 chatgpt gpt-4 lerf nerf nerfstudio
Language:Python 309
mlpc-ucsd / BLIVA
(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions
blip2 bliva chatbot instruction-tuning llama llm lora multimodal visual-language-learning
Language:Python 257
gongzix / NeuroClips
Official code base for NeuroClips
blip2 brain-decoding fmri fmri-to-video videodiffusion
Language:MATLAB 86
SmithaUpadhyaya / fashion_image_caption
Automate Fashion Image Captioning using BLIP-2. Automatic generating descriptions of clothes on shopping websites, which can help customers without fashion knowledge to better understand the features (attributes, style, functionality etc.) of the items and increase online sales by enticing more customers.
blip2 huggingface-datasets huggingface-transformers image image-caption-generator transformer multimodal-deep-learning
Language:Jupyter Notebook 57
152334H / MiniGPT-4-discord-bot
A true multimodal LLaMA derivative -- on Discord!
ai blip2 discord-bot llama llm multimodal vicuna
Language:Python 45
BUAADreamer / SPN4CIR
[ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives
acmmm2024 blip blip2 clip composed-image-retrieval cross-modal-retrieval data-generation image-retrieval llama llava memory-bank multi-modal-retrieval multimodal-learning transformer
Language:Python 38
kyegomez / qformer
Implementation of Qformer from BLIP2 in Zeta Lego blocks.
ai artificial-intelligence attention-mechanism blip2 machine machine-learning multi-modal multi-modality
Language:Python 38
eric-ai-lab / ComCLIP
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
blip2 causality clip compositionality flickr30k flickr8k-dataset image-text-matching image-text-retrieval slip svo vision-and-language winoground
Language:Python 35
nngocson2002 / ViVQA
The Multimodal Model for Vietnamese Visual Question Answering (ViVQA)
bartpho beit-3 blip2 efficientnet multimodal-deep-learning vivqa vqa
Language:Python 16
zer0int / CLIP-Interrogator-LongCLIP-hallucinwords
CLIP Interrogator, fully in HuggingFace Transformers 🤗, with LongCLIP & CLIP's own words and / or *your* own words!
blip blip2 captioning-images clip clip-interrogator gradient-ascent interrogator longclip
Language:Python 16
ZhaoPeiduo / BLIP2-Japanese
Modifying LAVIS' BLIP2 Q-former with models pretrained on Japanese datasets.
captioning japanese pytorch blip2 multimodal-deep-learning
Language:Python 12
matlok-ai / bampe-weights
This repository is for profiling, extracting, visualizing and reusing generative AI weights to hopefully build more accurate AI models and audit/scan weights at rest to identify knowledge domains for risk(s).
ai blip2 foundational-models generative-ai gptq image-to-image llm safetensors stable-diffusion tiff transformers weights blender blender-python deep-learning
Language:Python 9
jacobmarks / fiftyone-image-captioning-plugin
Caption images across your datasets with state of the art models from Hugging Face and Replicate!
blip2 computer-vision fiftyone fuyu huggingface huggingface-transformers image-captioning llava qwen
Language:Python 8
MichiganNLP / visual_diversity_budget
Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model Performance and Annotation Cost
active-learning align blip2 clip datasets multimodal-deep-learning geo-diverse computer-vison diversity-analysis
8
aws-samples / visual-question-answering-finetuning
Finetuning Large Visual Models on Visual Question Answering
blip2 finetuning genai vqa
Language:Jupyter Notebook 6
craigsdennis / scairy
Uses AI to scare people...more.
ai blip2 elevenlabs halloween llama2 replicate sadtalker
Language:Python 4
leeyunjai / image2text
caption generator using lavis and argostranslate
caption caption-generation caption-generator captioning-images captions image-analysis image-text img2txt blip2
Language:Python 4
Pavansomisetty21 / Visual-Question-Answering-using-Gemini-LLM
In this we explore into visual Question Answering Using Gemini LLM and image was in URL or any other extension
artificial-intelligence blip blip2 gemini gemini-flash generative-ai generative-model git question-answering vision-language-model vision-transformer visual-models visual-question-answering vlm vqa
Language:Jupyter Notebook 4
otdavies / AIOrganizeMyDesktop
Too lazy to organize my desktop, make gpt + BLIP-2 do it /s
automation desktop example-project gpt-3 organization python ai blip2 machinelearning
Language:Python 2
arashsajjadi / ai-powered-video-analyzer
An offline AI-powered video analysis tool with object detection (YOLO), image captioning (BLIP), speech transcription (Whisper), audio event detection (PANNs), and AI-generated summaries (LLMs via Ollama). It ensures privacy and offline use with a user-friendly GUI.
ai-video-analysis blip2 gui image-captioning image-captioning-ai llm object-detection offline-processing ollama ollama-api panns privacy speech-transcription whisper whisper-ai yolo yolo11 audio-event-detection llm-summarization
Language:Python 1
HemantM29 / Multimodal-Document-Analysis-and-Query-Retrieval
This project performs multimodal document analysis and query retrieval by downloading PDFs, converting pages to images, indexing them for semantic search, and analyzing retrieved images using visual-language models like Qwen2VL and Blip2.
blip2 image-indexing multimodal-analysis natural-language-queries pdf-processing qwen2-vl retrieval-augmented-generation semantic-search transformers visual-language-models
Language:Jupyter Notebook 1
shreyassks / Stylised-Image-Captions-with-RL-PPO
Creating stylish social media captions for an Image using Multi Modal Models and Reinforcement Learning
blip2 image-captions proximal-policy-optimization reinforcement-learning swin-transformers adaptive-scst personality-captions
Language:Jupyter Notebook 1
thisisiron / QFormer_Pretraining
Implementation of Qformer pre-training
blip-2 blip2 vision-language-model vlm qformer
Language:Python 1
yessasvini23 / ContextVision-AI-Powered-Visual-Assistant-for-Accessibility
ContextVision is an AI-powered real-time scene understanding assistant that helps visually impaired individuals interpret their surroundings through live video analysis, speech interaction, and AI-driven insights
artificial-intelligence blip2 gpt4 gradio langchain opencv yolov8
Language:Python 1
ergonomech / BLIP-2-Image-Describer
A web-based application that leverages the BLIP-2 model to generate detailed descriptions of uploaded images.
blip2 gradio image2text
Language:Python 0
fvarghese99 / AltTextGenerator
AltTextGenerator is a Python-based tool leveraging the BLIP-2 model to generate meaningful alt text for images. It processes images from a folder, generates accurate alt text, and renames the images, aiding developers, content creators, and accessibility professionals.
alt-text-generator web-accessibility ai blip2 python seo-optimization
Language:Python 0
notslok / Image-Caption-Generator
An end to end Deep Learning based tool for image caption generation.
blip2 gradio pegasus-paraphrase tool transformers-models
Language:Jupyter Notebook 0
readygetset / inthon2024
Winning solution for image captioning challenge at 2024 InThon Datathon
blip2 image-captioning multimodal prompt-engineering deep-learning
Language:Jupyter Notebook 0
Aryan-coder-student / NeuroVision-VQA
An advanced implementation of a Visual Question Answering (VQA) model powered by the BLIP (Bootstrapped Language-Image Pre-training) framework. This repository includes comprehensive data preprocessing pipelines, model training workflows, rigorous evaluation protocols, and seamless deployment options via Flask API and Streamlit UI.
blip2 fine-tuning hackathon-project medical-image-processing vqa
Language:Jupyter Notebook

blip2

DAMO-NLP-SG / Video-LLaMA

sled-group / chat-with-nerf

mlpc-ucsd / BLIVA

gongzix / NeuroClips

SmithaUpadhyaya / fashion_image_caption

152334H / MiniGPT-4-discord-bot

BUAADreamer / SPN4CIR

kyegomez / qformer

eric-ai-lab / ComCLIP

nngocson2002 / ViVQA

zer0int / CLIP-Interrogator-LongCLIP-hallucinwords

ZhaoPeiduo / BLIP2-Japanese

matlok-ai / bampe-weights

jacobmarks / fiftyone-image-captioning-plugin

MichiganNLP / visual_diversity_budget

aws-samples / visual-question-answering-finetuning

craigsdennis / scairy

leeyunjai / image2text

Pavansomisetty21 / Visual-Question-Answering-using-Gemini-LLM

otdavies / AIOrganizeMyDesktop

arashsajjadi / ai-powered-video-analyzer

HemantM29 / Multimodal-Document-Analysis-and-Query-Retrieval

shreyassks / Stylised-Image-Captions-with-RL-PPO

thisisiron / QFormer_Pretraining

yessasvini23 / ContextVision-AI-Powered-Visual-Assistant-for-Accessibility

ergonomech / BLIP-2-Image-Describer

fvarghese99 / AltTextGenerator

notslok / Image-Caption-Generator

readygetset / inthon2024

Aryan-coder-student / NeuroVision-VQA