There are 4 repositories under visual-reasoning topic.
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
FiLM: Visual Reasoning with a General Conditioning Layer
Recent Papers including Neural Symbolic Reasoning, Logical Reasoning, Visual Reasoning, planning and any other topics connecting deep learning and reasoning
🚀ReVisual-R1 is a 7B open-source multimodal language model that follows a three-stage curriculum—cold-start pre-training, multimodal reinforcement learning, and text-only reinforcement learning—to achieve faithful, concise, and self-reflective state-of-the-art performance in visual and textual reasoning.
✨✨Latest Advances on Neuro-Symbolic Learning in the era of Large Language Models
RAVEN: A Dataset for Relational and Analogical Visual rEasoNing
Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give it a star 🌟 if you find it useful.
Official code for NeurIPS 2025 paper "GRIT: Teaching MLLMs to Think with Images"
[NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding
Pytorch implementation of "Explainable and Explicit Visual Reasoning over Scene Graphs "
[CVPR 2022 (oral)] Bongard-HOI for benchmarking few-shot visual reasoning
[ICLR 2022] RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning
Image captioning using python and BLIP
WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning
[arxiv 2025] RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning
Visual Question Reasoning on General Dependency Tree
📄 A curated list of visual reasoning papers.
Learning Perceptual Inference by Contrasting
An alternative EQA paradigm and informative benchmark + models (BMVC 2019, ViGIL 2019 spotlight)
Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution
NeuSyRE: A Neuro-Symbolic Visual Understanding and Reasoning Framework based on Scene Graph Enrichment
ACRE: Abstract Causal REasoning Beyond Covariation
[NeurIPS 2025] ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models
Learning Algebraic Representation for Systematic Generalization in Abstract Reasoning
Pytorch implementation of " A simple neural network module for relational reasoning" paper aka Relational networks for visual reasoning.
[COLM'25] The official implementation of "LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception"
Official implementation of the paper "Understanding Language Prior of LVLMs by Contrasting Chain-of-Embedding"
A list of research papers on knowledge-enhanced multimodal learning
Convert RGB images of Visual-Genome dataset to Depth Maps.
[AAAI 2023] Hierarchical ConViT with Attention-based Relational Reasoner for Visual Analogical Reasoning
Evaluating ‘Graphical Perception’ with Multimodal Large Language Models
LaTeX files for my honours thesis: "Graph Attention Networks for Compositional Visual Question Answering"