mqjyl / awesome-scene-graph

A curated list of scene graph generation and related area resources. :-)

Awesome Scene Graph

A curated list of scene graph generation and related tasks, inspired by awesome-computer-vision and awesome-action-recognition. :-)

For a list of papers on 2-D Scene Graph grouped in various methods, please visit Methods

Introduction

Please feel free to send me pull requests or email (mqjyl2012@163.com) to add links.

Markdown format of paper list items:

- [Paper Name](link) - Author 1 _et al_, `Conference Year`. [[code]](link)

Table of Contents

Scene Graph Generation
Human-centric Relation
Object Recognition
Related High-level Scene Understanding Tasks
Workshops
Challenges

Scene Graph Generation

2-D Scene Graph

2020

Contextual Translation Embedding for Visual Relationship Detection and Scene Graph Generation - Zih-Siou Hung et al, T-PAMI 2020.
Leveraging Auxiliary Text for Deep Recognition of Unseen Visual Relationships - Gal Sadeh Kenigsfield et al, ICLR 2020.
Unbiased Scene Graph Generation from Biased Training - Kaihua Tang et al, CVPR 2020. [code]
Weakly Supervised Visual Semantic Parsing - Alireza Zareian et al, CVPR 2020.
GPS-Net: Graph Property Sensing Network for Scene Graph Generation - Xin Lin et al, CVPR 2020. [code]
Deep Generative Probabilistic Graph Neural Networks for Scene Graph Generation - Mahmoud Khademi et al, AAAI 2020.
PCPL: Predicate-Correlation Perception Learning for Unbiased Scene Graph Generation - Shaotian Yan et al, MM 2020.
Memory-Based Network for Scene Graph with Unbalanced Relations - Weitao Wang et al, MM 2020.
Part-Aware Interactive Learning for Scene Graph Generation - Hongshuo Tian et al, MM 2020.
One-shot Scene Graph Generation - Yuyu Gao et al, MM 2020.
HOSE-Net: Higher Order Structure Embedded Network for Scene Graph Generation - Meng Wei et al, MM 2020.
Unbiased Scene Graph Generation via Rich and Fair Semantic Extraction - Bin Wen et al, ARXIV 2020.
Long-tail Visual Relationship Recognition with a Visiolinguistic Hubless Loss - Sherif Abdelkarim et al, ARXIV 2020.
Bridging Knowledge Graphs to Generate Scene Graphs - Alireza Zareian et al, ARXIV 2020.
NODIS: Neural Ordinary Differential Scene Understanding - Cong Yuren et al, ARXIV 2020.
AVR: Attention based Salient Visual Relationship Detection - Jianming Lv et al, ARXIV 2020.

2019

Large-Scale Visual Relationship Understanding - Ji Zhang et al, AAAI 2019. [code]
Learning to Compose Dynamic Tree Structures for Visual Contexts - Kaihua Tang et al, CVPR 2019 Oral. [code]
Counterfactual Critic Multi-Agent Training for Scene Graph Generation - Long Chen et al, ICCV 2019 Oral.
On Exploring Undetermined Relationships for Visual Relationship Detection - Yibing Zhan et al, CVPR 2019. [code]
Exploring Context and Visual Pattern of Relationship for Scene Graph Generation - Wenbin Wang et al, CVPR 2019.
Relationship-Aware Spatial Perception Fusion for Realistic Scene Layout Generation - Hongdong Zheng et al, arXiv 2019.
The Limited Multi-Label Projection Layer - Brandon Amos et al, arXiv 2019. [code]
Detecting Visual Relationships Using Box Attention - Alexander Kolesnikov et al, ICCVW 2019.
Visual Relationships as Functions: Enabling Few-Shot Scene Graph Prediction - Apoorva Dornadula et al, ICCVW 2019.
Attention-Translation-Relation Network for Scalable Scene Graph Generation - Nikolaos Gkanatsios et al, ICCVW 2019. [code]
Attentive Relational Networks for Mapping Images to Scene Graphs - Mengshi Qi et al, CVPR 2019.
Visual Spatial Attention Network for Relationship Detection - Chaojun Han, et al, ACM MM 2019.
Visual Relation Detection with Multi-Level Attention - Sipeng Zheng, et al, ACM MM 2019.
Visual Relationship Recognition via Language and Position Guided Attention - Hao Zhou, et al, ICASSP 2019.
Relationship Detection Based on Object Semantic Inference and Attention Mechanisms - Liang Zhang et al, ICMR 2019.
Natural Language Guided Visual Relationship Detection - Wentong Liao et al, CVPR 2019.
Knowledge-Embedded Routing Network for Scene Graph Generation - Tianshui Chen et al, CVPR 2019. [code]
Soft Transfer Learning via Gradient Diagnosis for Visual Relationship Detection - Diqi Chen et al, WACV 2019.
Compensating Supervision Incompleteness with Prior Knowledge in Semantic Image Interpretation - Ivan Donadello et al, IJCNN 2019. [code]
Hierarchical Visual Relationship Detection - Xu Sun et al, ACM MM 2019.
Visual Relationship Detection with Low Rank Non-Negative Tensor Decomposition - Mohammed Haroon Dupty et al, arXiv 2019.
Relational Reasoning using Prior Knowledge for Visual Captioning - Jingyi Hou et al, arXiv 2019.
Scene Graph Generation with External Knowledge and Image Reconstruction - Jiuxiang Gu et al, CVPR 2019. [code]
Attention-Translation-Relation Network for Scalable Scene Graph Generation - Nikolaos Gkanatsios et al, ICCV 2019.
Detecting Unseen Visual Relations Using Analogies - Julia Peyre et al, ICCV 2019.
VrR-VG: Refocusing Visually-Relevant Relationships - Yuanzhi Liang et al, ICCV 2019.
SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial Relation Recognition - Kaiyu Yang et al, ICCV 2019. [code]
BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection - Kaiyu Yang et al, AAAI 2019. [code]
On Class Imbalance and Background Filtering in Visual Relationship Detection - Alessio Sarullo et al, arXiv 2019.
Support Relation Analysis for Objects in Multiple View RGB-D Images - Peng Zhang et al, IJCAIW QR 2019.
Improving Visual Relation Detection using Depth Maps - Sahand Sharifzadeh et al, arXiv 2019. [code]
MR-NET: Exploiting Mutual Relation for Visual Relationship Detection - Yi Bin et al, AAAI 2019.
Scene Graph Prediction with Limited Labels - Vincent S. Chen et al, ICCV 2019. [code]
Differentiable Scene Graphs - Moshiko Raboh et al, ICCVW 2019.
Graphical Contrastive Losses for Scene Graph Parsing - Ji Zhang et al, CVPR 2019. [code]
Generating Expensive Relationship Features from Cheap Objects - Xiaogang Wang et al, BMVC 2019.
Neural Message Passing for Visual Relationship Detection - Yue Hu et al, ICML LRG Workshop 2019. [code]
PANet: A Context Based Predicate Association Network for Scene Graph Generation - Yunian Chen et al, ICME 2019.
Visual Relationship Detection with Relative Location Mining - Hao Zhou et al, ACM MM 2019.
Visual relationship detection based on bidirectional recurrent neural network - Yibo Dai et al, Multimedia Tools and Applications 2019.
Exploring the Semantics for Visual Relationship Detection - Wentong Liao et al, arXiv 2019.
Optimising the Input Image to Improve Visual Relationship Detection - Noel Mizzi et al, arXiv 2019.
Deeply Supervised Multimodal Attentional Translation Embeddings for Visual Relationship Detection - Nikolaos Gkanatsios et al, arXiv 2019.
Learning Effective Visual Relationship Detector on 1 GPU - Yichao Lu et al, arXiv 2019.

2018

Graph R-CNN for Scene Graph Generation - Jianwei Yang et al, ECCV 2018. [code]
LinkNet_Relational Embedding for Scene Graph - Sanghyun Woo et al, NIPS 2018. [code]
Generating Triples with Adversarial Networks for Scene Graph Construction - Matthew Klawonn et al, AAAI 2018.
Scene Graph Generation Based on Node-Relation Context Module - Xin Lin et al, ICONIP 2018.
Factorizable Net: An Efficient Subgraph-based Framework for Scene Graph Generation - Yikang Li et al, ECCV 2018. [code]
Neural Motifs_Scene Graph Parsing with Global Context - Rowan Zellers et al, CVPR 2018. [code]
Zoom-Net: Mining Deep Feature Interactions for Visual Relationship Recognition - Guojun Yin et al, ECCV 2018. [code]
Deep Structured Learning for Visual Relationship Detection - Yaohui Zhu et al, AAAI 2018.
Tensorize, Factorize and Regularize: Robust Visual Relationship Learning - Seong Jae Hwang et al, CVPR 2018. [code]
Representation Learning for Scene Graph Completion via Jointly Structural and Visual Embedding - Hai Wan et al, IJCAI 2018. [code]
Visual Relationship Detection Using Joint Visual-Semantic Embedding - Binglin Li et al, ICPR 2018.
Object Relation Detection Based on One-shot Learning - Li Zhou et al, arXiv 2018.
A Problem Reduction Approach for Visual Relationships Detection - Toshiyuki Fukuzawa et al, ECCVW 2018.
An Interpretable Model for Scene Graph Generation - Ji Zhang et al, arXiv 2018.
Shuffle-Then-Assemble: Learning Object-Agnostic Visual Relationship Features - Xu Yang et al, ECCV 2018. [code]
Visual Relationship Detection with Deep Structural Ranking - Kongming Liang et al, AAAI 2018. [code]
Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction - Roei Herzig et al, NIPS 2018. [code]
Learning Prototypes for Visual Relationship Detection - François Plesse et al, CBMI 2018.
Visual Relationship Detection with Language prior and Softmax - Jaewon Jung et al, IPAS 2018. [code]
Visual Relationship Detection Based on Guided Proposals and Semantic Knowledge Distillation - François Plesse et al, ICME 2018.
Context-Dependent Diffusion Network for Visual Relationship Detection - Zhen Cui et al, ACM MM 2018. [code]
Region-Object Relevance-Guided Visual Relationship Detection - Yusuke Goutsu et al, BMVC 2018.
Recurrent Visual Relationship Recognition with Triplet Unit for Diversity - Kento Masui et al, IJSC 2018.
Deep Image Understanding Using Multilayered Contexts - Donghyeop Shin et al, MPE 2018.
Scene Graph Generation via Conditional Random Fields - Weilin Cong et al, arXiv 2018.

2017

Scene Graph Generation by Iterative Message Passing - Danfei Xu et al, CVPR 2017. [code]
Scene Graph Generation from Objects, Phrases and Region Captions - Yikang Li et al, ICCV 2017. [code]
ViP-CNN: Visual Phrase Guided Convolutional Neural Network - Yikang Li et al, CVPR 2017.
Detecting Visual Relationships with Deep Relational Networks - Bo Dai et al, CVPR 2017. [code]
Towards Context-Aware Interaction Recognition for Visual Relationship Detection - Bohan Zhuang et al, ICCV 2017.
Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues - Bryan A et al, ICCV 2017. [code]
Deep Variation-Structured Reinforcement Learning for Visual Relationship and Attribute Detection - Xiaodan Liang et al, CVPR 2017. [code]
Visual Relationship Detection with Internal and External Linguistic Knowledge Distillation - Ruichi Yu et al, ICCV 2017.
Visual Translation Embedding Network for Visual Relation Detection - Hanwang Zhang et al, CVPR 2017. [code]
Detecting Visual Relationships with Deep Relational Networks - Bo Dai et al, CVPR 2017. [code]
Pixels to Graphs by Associative Embedding - Alejandro Newell et al, NIPS 2017. [code]
Relationship Proposal Networks - Ji Zhang et al, CVPR 2017.
Weakly-Supervised Learning of Visual Relations - Julia Peyre et al, ICCV 2017. [code]
PPR-FCN: Weakly Supervised Visual Relation Detection via Parallel Pairwise R-FCN - Hanwang Zhang et al, ICCV 2017. [code]
Visual relationship detection with object spatial distribution - Yaohui Zhu et al, ICME 2017.
On Support Relations and Semantic Scene Graphs - Michael Ying Yang et al, ISPRS 2017.
Improving Visual Relationship Detection using Semantic Modeling of Scene Descriptions - Bryan A et al, ISWC 2017.
Recurrent Visual Relationship Recognition with Triplet Unit - Kento Masui et al, ISM 2017.

2016 and before

Visual Relationship Detection with Language Priors - Cewu Lu et al, ECCV 2016 Oral. [code]
Recognition using visual phrases - Mohammad Amin Sadeghi et al, CVPR 2011.

Spatio-Temporal Scene Graph

Video Relationship Reasoning using Gated Spatio-Temporal Energy Graph - Yao-Hung Hubert Tsai et al, CVPR 2019. [code]
Video Relation Detection with Spatio-Temporal Graph - Xufeng Qian et al, ACM MM 2019.
Video Visual Relation Detection via Multi-modal Feature Fusion - Xu Sun et al, ACM MM 2019.
Relation Understanding in Videos - Sipeng Zheng et al, ACM MM 2019.
Annotating Objects and Relations in User-Generated Videos - Xindi Shang et al, ICMR 2019.
Relation Understanding in Videos: A Grand Challenge Overview - Xindi Shang et al, ACM MM 2019.
Action Genome: Actions as Composition of Spatio-temporal Scene Graphs - Jingwei Ji et al, arXiv 2019.
Video Visual Relation Detection - Xindi Shang et al, ACM MM 2017. [code]

3-D Scene Graph

Learning 3D Semantic Scene Graphs from 3D Indoor Reconstructions - Johanna Wald et al, CVPR 2020.
3-D Scene Graph: A Sparse and Semantic Representation of Physical Environments for Intelligent Agents - Ue-Hwan Kim et al, IEEE transactions on cybernetics 2019. [code]
3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera - Iro Armeni et al, ICCV 2019. [code]
Visual Graphs from Motion (VGfM): Scene understanding with object geometry reasoning - Paul Gay et al, ACCV 2018. [code]

Generate Scene Graph from Textual Description

Scene Graph Parsing as Dependency Parsing - Yu-Siang Wang et al, NAACL 2018. [code]
Scene Graph Parsing by Attention Graph - Martin Andrews et al, NIPS 2018.

Other Works

Relationship Prediction for Scene Graph Generation - Uzair Navid Iftikhar et al, 2019.
Joint Learning of Scene Graph Generation and Reasoning for Visual Question Answering Mid-term report - Arka Sadhu et al, 2019.
Scene-Graph-Generation
Joint Embeddings of Scene Graphs and Images - Eugene Belilovsky et al, 2020.

Datasets

Image

Visual Genome : Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Ranjay Krishna et al, IJCV 2016. [download]
VRD : Visual Relationship Detection with Language Priors - Cewu Lu et al, ECCV 2016 Oral. [download]
Open Images : The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale - Alina Kuznetsova et al, IJCV 2018. [download]
Scene Graph : Image Retrieval using Scene Graphs - Justin Johnson et al, CVPR 2015. [download]
Visual Phrases : Recognition Using Visual Phrases - Ali Farhadi et al, CVPR 2011. [download]
VrR-VG : VrR-VG: Refocusing Visually-Relevant Relationships - Yuanzhi Liang et al, ICCV 2019. [download]
UnRel : Weakly-Supervised Learning of Visual Relations - Julia Peyre et al, ICCV 2017. [download]
SpatialVOC2K : SpatialVOC2K: A Multilingual Dataset of Images with Annotations and Features for Spatial Relations between Objects - Anja Belz et al, INLG 2018. [download]
SpatialSense : SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial Relation Recognition - Kaiyu Yang et al, ICCV 2019. [download]
Visual and Linguistic Treebank : Image Description using Visual Dependency Representations - Desmond Elliott et al, EMNLP 2013. [[download]]
ViSen : Combining geometric, textual and visual features for predicting prepositions in image descriptions - Arnau Ramisa et al, EMNLP 2015. [[download]]
SynthRel0 : SynthRel0: Towards a Diagnostic Dataset for Relational Representation Learning - Daniel Dorda et al, ICCVW 2019. [[download]]

RGBD

NYU Depth Dataset V2 : Indoor Segmentation and Support Inference from RGBD Images - Nathan Silberman et al, ECCV 2012. [download]

Video

VidVRD : Video Visual Relation Detection - Xindi Shang et al, ACM MM 2017. [download]
VidOR dataset : Annotating Objects and Relations in User-Generated Videos - Xindi Shang et al, ACM MM 2019. [download]

3-D

3D Scene Graph Dataset : 3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera - Iro Armeni et al, ICCV 2019. [download]
Learning 3D Semantic Scene Graphs from 3D Indoor Reconstructions - Johanna Wald et al, CVPR 2020. [download]

Evaluation Metrics

Human-centric Relation

Person in Centext(PIC)

Visual Relationship Prediction via Label Clustering and Incorporation of Depth Information - Hsuan-Kung Yang et al, ECCVW 2018.

Human-Object Interaction(HOI)

HOI Image

2020

VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions - Oytun Ulutan et al, CVPR 2020. [code]
Learning Human-Object Interaction Detection using Interaction Points - Tiancai Wang et al, CVPR 2020. [code]
Detailed 2D-3D Joint Representation for Human-Object Interaction - Yong-Lu Li et al, CVPR 2020. [code]
Cascaded Human-Object Interaction Recognition - Tianfei Zhou et al, CVPR 2020. [code]
PPDM: Parallel Point Detection and Matching for Real-time Human-Object Interaction Detection - Yue Liao et al, CVPR 2020. [code]
Detecting Human-Object Interactions via Functional Generalization - Ankan Bansal et al, AAAI 2020.
Classifying All Interacting Pairs in a Single Shot - Sanaa Chafik et al, WACV 2020.
Visual-Semantic Graph Attention Network for Human-Object Interaction Detection - Zhijun Liang et al, ARXIV 2020.
Spatial Priming for Detecting Human-Object Interactions - Ankan Bansal et al, ARXIV 2020.
GID-Net: Detecting Human-Object Interaction with Global and Instance Dependency - Dongming Yang et al, ARXIV 2020.

2019

Reasoning About Human-Object Interactions Through Dual Attention Networks - Tete Xiao et al, ICCV 2019.
Relation Parsing Neural Network for Human-Object Interaction Detection - Penghao Zhou et al, ICCV 2019.
No-Frills Human-Object Interaction Detection: Factorization, Layout Encodings, and Training Techniques - Tanmay Gupta et al, ICCV 2019. [code]
Pose-aware Multi-level Feature Network for Human Object Interaction Detection - Bo Wan et al, ICCV 2019. [code]
Deep Contextual Attention for Human-Object Interaction Detection - Tiancai Wang et al, ICCV 2019.
Understanding Human Gaze Communication by Spatio-Temporal Graph Reasoning - Lifeng Fan et al, ICCV 2019. [code]
Holistic++ Scene Understanding: Single-view 3D Holistic Scene Parsing and Human Pose Estimation with Human-Object Interaction and Physical Commonsense - Yixin Chen et al, ICCV 2019. [code]
Transferable Interactiveness Knowledge for Human-Object Interaction Detection - Yong-Lu Li et al, CVPR 2019. [code]
Learning to Detect Human-Object Interactions with Knowledge - Bingjie Xu et al, CVPR 2019.
Do Deep Neural Networks Model Nonlinear Compositionality in the Neural Representation of Human-Object Interactions? - Aditi Jha et al, CCN 2019.

2018

Detecting and Recognizing Human-Object Interactions - Georgia Gkioxari et al, CVPR 2018.
Learning Human-Object Interactions by Graph Parsing Neural Networks - Siyuan Qi et al, ECCV 2018. [code]
Pairwise Body-Part Attention for Recognizing Human-Object Interactions - Hao-Shu Fang et al, ECCV 2018.
Compositional Learning for Human Object Interaction - Keizo Kato et al, ECCV 2018. [code]
iCAN: Instance-Centric Attention Network for Human-Object Interaction Detection - Chen Gao et al, BMVC 2018. [code]
Interact as You Intend: Intention-Driven Human-Object Interaction Detection - Bingjie Xu et al, TMM 2018.
Scaling Human-Object Interaction Recognition through Zero-Shot Learning - Liyue Shen et al, WACV 2018.
Learning to Detect Human-Object Interactions - Yu-Wei Chao et al, WACV 2018. [code]

2017及以前

Fine-grained Event Learning of Human-Object Interaction with LSTM-CRF - Tuan Do et al, ESANN 2017.
Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering - Arun Mallya et al, ECCV 2016.
Human Centred Object Co-Segmentation - Chenxia Wu et al, ARXIV 2016.
HICO: A Benchmark for Recognizing Human-Object Interactions in Images - Yu-Wei Chao et al, ICCV 2015.
Recognising Human-Object Interaction via Exemplar based Modelling - Jian-Fang Hu et al, ICCV 2013.
Learning person-object interactions for action recognition in still images - Vincent Delaitre et al, NIPS 2011.
Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities - Bangpeng Yao et al, CVPR 2010.
Discriminative models for static human-object interactions - Chaitanya Desai et al, CVPRW 2010.

HOI Video

Grounded Human-Object Interaction Hotspots from Video - Tushar Nagarajan et al, ICCV 2019. [code]
iMapper: Interaction-guided Joint Scene and Human Motion Mapping from Monocular Videos - Aron Monszpart et al, Siggraph 2019.
Causality Inspired Retrieval of Human-object Interactions from Video - Liting Zhou et al, CBMI 2019.
Zero-Shot Generation of Human-Object Interaction Videos - Megha Nawhal et al, ARXIV 2019.
Forecasting Human Object Interaction: Joint Prediction of Motor Attention and Egocentric Activity - Miao Liu et al, ARXIV 2019.
Attend and Interact: Higher-Order Object Interactions for Video Understanding - Chih-Yao Ma et al, CVPR 2018.
The "something something" video database for learning and evaluating visual common sense - Raghav Goyal et al, ICCV 2017. [code] [code_v2]

Other Works

Detecting Human-Object Interactions in Real-Time

HOI Evaluation Metrics

HCR Datasets

PIC 1.0 / 2.0 : [download]
HOI-W : [download]
HCVRD: HCVRD: A Benchmark for Large-Scale Human-Centered Visual Relationship Detection - Saurabh Gupta et al, AAAI 2018. [download]
Verbs in COCO (V-COCO) : Visual Semantic Role Labeling - Saurabh Gupta et al, ARXIV 2015. [download]
HICO : A Benchmark for Recognizing Human-Object Interactions in Images - Yu-Wei Chao et al, ICCV 2015. [download]
TUHOI : A Benchmark for Recognizing Human-Object Interactions in Images - Dieu-Thu Le et al, ACL 2014. [download]
20BN-SOMETHING-SOMETHING : The "something something" video database for learning and evaluating visual common sense - Raghav Goyal et al, ICCV 2017. [download]

Improve Object Recognition

Related High-level Vision-and-Language Tasks

Image Caption

Using Scene Graph

Learning visual relationship and context-aware attention for image captioning - Junbo Wang et al, Pattern Recognition 2020.
Object Relational Graph with Teacher-Recommended Learning for Video Captioning - Ziqi Zhang et al, CVPR 2020.
Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs - Shizhe Chen et al, CVPR 2020. [code]
Joint Commonsense and Relation Reasoning for Image and Video Captioning - Jingyi Hou et al, AAAI 2020.
Auto-Encoding Scene Graphs for Image Captioning - Xu Yang et al, CVPR 2019. [code]
Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning - Dong-Jin Kim et al, CVPR 2019. [code]
Visual Semantic Reasoning for Image-Text Matching - Kunpeng Li et al, ICCV 2019. [code]
Unpaired Image Captioning via Scene Graph Alignments - Jiuxiang Gu et al, ICCV 2019. [code]
Expressing Visual Relationships via Language - Hao Tan et al, ACL 2019. [code]
On the Role of Scene Graphs in Image Captioning - Dalin Wang et al, ACL 2019.
Adversarial Adaptation of Scene Graph Models for Understanding Civic Issues - Shanu Kumar et al, WWW 2019. [code]
Aligning Linguistic Words and Visual Semantic Units for Image Captioning - Longteng Guo et al, ACM MM 2019. [code]
Better Understanding Hierarchical Visual Relationship for Image Caption - Zheng-cong Fei et al, NeurIPS 2019 workshop on New In ML.
Visual Relationship Embedding Network for Image Paragraph Generation - Wenbin Che et al, TMM 2019.
Know More Say Less: Image Captioning Based on Scene Graphs - Xiangyang Li et al, TMM 2019.
Visual Relationship Attention for Image Captioning - Zongjian Zhang et al, IJCNN 2019.
Scene graph captioner: Image captioning based on structural visual representation - Ning Xu et al, VCIR 2019.
Exploring Semantic Relationships for Image Captioning without Parallel Data - Fenglin Liu et al, ICDM 2019.
TPsgtR: Neural-Symbolic Tensor Product Scene-Graph-Triplet Representation for Image Captioning - Chiranjib Sur et al, ARXIV 2019.
Learning Visual Relation Priors for Image-Text Matching and Image Captioning with Neural Scene Graph Generators - Kuang-Huei Lee et al, ARXIV 2019.
Relational Reasoning using Prior Knowledge for Visual Captioning - Jingyi Hou et al, ARXIV 2019.
Exploring Visual Relationship for Image Captioning - Ting Yao et al, ECCV 2018.
Paragraph Generation Network with Visual Relationship Detection - Wenbin Che et al, ACM MM 2018.
Image Captioning with Scene-graph Based Semantic Concepts - Lizhao Gao et al, ICMLC 2018.
Improved Image Description Via Embedded Object Structure Graph and Semantic Feature Matching - Li Ren et al, ISM 2018.
Sports Video Captioning by Attentive Motion Representation based Hierarchical Recurrent Neural Networks - Mengshi Qi et al, 2018.

Classic Papers

Image Captioning and Visual Question Answering Based on Attributes and External Knowledge - Qi Wu et al, TPAMI 2017.
SPICE: Semantic Propositional Image Caption Evaluation - Peter Anderson et al, ECCV 2016. [code]

Image Caption Datasets

MS COCO : Microsoft COCO: Common Objects in Context - Tsung-Yi Lin et al, ECCV 2014. [download]
Flickr30K : Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models - Bryan A. Plummer et al, IJCV 2017. [download]
Flickr8K : Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics - Micah Hodosh et al, IJCAI 2013. [download]
Visual Genome : Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Ranjay Krishna et al, IJCV 2016. [download]
IAPR TC-12 : The IAPR TC-12 Benchmark: A New Evaluation Resource for Visual Information Systems - Michael Grubinger et al, International workshop onto Image 2006. [download]

Referring Expression Comprehension - Visual Grounding

Using Scene Graph

Cross-Modal Relationship Inference for Grounding Referring Expressions - Sibei Yang et al, CVPR 2019. [code]
Relationship-Embedded Representation Learning for Grounding Referring Expressions - Sibei Yang et al, TPAMI 2020. [code]
Referring Expression Comprehension with Semantic Visual Relationship and Word Mapping - Chao Zhang et al, ACM MM 2019.
Learning to Relate from Captions and Bounding Boxes - Sarthak Garg et al, ACL 2019.
Joint Visual Grounding with Language Scene Graphs - Daqing Liu et al, ARXIV 2019.
Modeling Relationships in Referential Expressions With Compositional Modular Networks - Ronghang Hu et al, CVPR 2017. [code]
Phrase Localization and Visual Relationship Detection With Comprehensive Image-Language Cues - Bryan A. Plummer et al, ICCV 2017. [code]

Classic Papers

Graph-Structured Referring Expression Reasoning in The Wild - Sibei Yang et al, CVPR 2020. [code]
Dynamic Graph Attention for Referring Expression Comprehension - Sibei Yang et al, ICCV 2019. [code]
Grounding Referring Expressions in Images by Variational Context - Hanwang Zhang et al, CVPR 2018. [code]

Visual Grounding Datasets

RefCOCO and RefCOCO+ : Modeling Context in Referring Expressions - Licheng Yu et al, ECCV 2016. [download]

Visual Question Answering

Using Scene Graph

DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue - Xiaoze Jiang et al, AAAI 2020. [code]
Visual Query Answering by Entity-Attribute Graph Matching and Reasoning - Peixi Xiong et al, CVPR 2019.
Relation-Aware Graph Attention Network for Visual Question Answering - Linjie Li et al, ICCV 2019. [code]
Multi-interaction Network with Object Relation for Video Question Answering - Weike Jin et al, ACM MM 2019.
CRA-Net: Composed Relation Attention Network for Visual Question Answering - Liang Peng et al, ACM MM 2019.
An Empirical Study on Leveraging Scene Graphs for Visual Question Answering - Cheng Zhangs et al, BMVC 2019. [code]
Generating Natural Language Explanations for Visual Question Answering using Scene Graphs and Visual Attention - Shalini Ghosh et al, ARXIV 2019.
R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering - Pan Lu et al, SIGKDD 2018. [code]
Scene Graph Reasoning with Prior Visual Relationship for Visual Question Answering - Zhuoqian Yang et al, ARXIV 2018.
VisKE: Visual Knowledge Extraction and Question Answering by Visual Verification of Relation Phrases - Fereshteh Sadeghi et al, CVPR 2015. [code]

Classic Papers

Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction - Hyeonwoo Noh et al, CVPR 2016. [code]
Ask Your Neurons: A Neural-based Approach to Answering Questions about Images - Mateusz Malinowski et al, ICCV 2015.

VQA Datasets

VQAv1 : VQA: Visual question answering - Aishwarya Agrawal et al, ICCV 2015. [download]
VQAv2 : Making the V in VQA matter: Elevating the role of image understanding in Visual Question Answering - Yash Goyal et al, CVPR 2017. [download]
COCO-QA : Image Question Answering: A Visual Semantic Embedding Model and a New Dataset - Mengye Ren et al, ICML 2015. or Exploring Models and Data for Image Question Answering - Mengye Ren et al, NIPS 2015. [download]

Visual Reasoning

Using Scene Graph

Differentiable Scene Graphs - Moshiko Raboh et al, WACV 2020.
Language-Conditioned Graph Networks for Relational Reasoning - Ronghang Hu et al, ICCV 2019. [code]
Explainable and Explicit Visual Reasoning over Scene Graphs - Jiaxin Shi et al, CVPR 2019. [code]
Referring Relationships - Ranjay Krishna et al, CVPR 2018. [code]
Broadcasting Convolutional Network for Visual Relational Reasoning - Simyung Chang et al, ECCV 2018.
A Simple Neural Network Module for Relational Reasoning - Adam Santoro et al, ARXIV 2017. [code]

Classic Papers

Object level Visual Reasoning in Videos - Fabien Baradel et al, ECCV 2018. [code]
A simple neural network module for relational reasoning - Adam Santoro et al, NIPS 2017. [code]

Visual Reasoning Datasets

GQA : GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering - Drew A. Hudson et al, CVPR 2019. [download]
CLEVR : CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning - Justin Johnson et al, CVPR 2017. [download] [code]

Image Generation - Content-based Image Retrieval(CBIR)

Using Scene Graph

PasteGAN: A Semi-Parametric Method to Generate Image from Scene Graph - Yikang Li et al, NIPS 2019. [code]
Scene Graph Generation with External Knowledge and Image Reconstruction - Jiuxiang Gu et al, CVPR 2019. [code]
Specifying Object Attributes and Relations in Interactive Scene Generation - Oron Ashual et al, ICCV 2019. [code]
Triplet-Aware Scene Graph Embeddings - Brigit Schroeder et al, ICCVW 2019.
Heuristics for Image Generation from Scene Graphs - Subarna Tripathi et al, ICLR 2019.
Interactive Image Generation Using Scene Graphs - Gaurav Mittal et al, ICLR 2019.
Visual-Relation Conscious Image Generation from Structured-Text - Duc Minh Vo et al, ARXIV 2019.
Using Scene Graph Context to Improve Image Generation - Subarna Tripathi et al, ARXIV 2019.
Learning Canonical Representations for Scene Graph to Image Generation - Roei Herzig et al, ARXIV 2019.
Relationship-Aware Spatial Perception Fusion for Realistic Scene Layout Generation - Hongdong Zheng et al, ARXIV 2019.
Image Generation from Scene Graphs - Justin Johnson et al, CVPR 2018. [code]

Classic Papers

Image Generation From Small Datasets via Batch Statistics Adaptation - Atsuhiro Noguchi et al, ICCV 2019. [code]
Text2Scene: Generating Compositional Scenes from Textual Descriptions - Fuwen Tan et al, CVPR 2019. [code]
Unsupervised Cross-Domain Image Generation - Yaniv Taigman et al, ICLR 2017 conference submission. [code]
Generative Visual Manipulation on the Natural Image Manifold - Jun-Yan Zhu et al, ECCV 2016. [code]
Attribute2Image: Conditional Image Generation from Visual Attributes - Xinchen Yan et al, ECCV 2016. [code]

Image Generation Datasets

COCO : Microsoft COCO: Common objects in context - Tsung-Yi Lin et al, ECCV 2014. [download]

Image Retrieval

Using Scene Graph

Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval - Sijin Wang et al, WACV 2020.
Scene Graph based Image Retrieval -- A case study on the CLEVR Dataset - Sahana Ramnath et al, ICCVW 2019.
Compact Scene Graphs for Layout Composition and Patch Retrieval - Subarna Tripathi et al, CVPRW 2019.
Revisiting Visual Grounding - Erik Conser et al, ACL 2019.
Learning visual features for relational CBIR - Nicola Messina et al, MIR 2019.
Learning Relationship-aware Visual Features - Nicola Messina et al, ECCVW 2018. [code]
Image retrieval by dense caption reasoning - Xinru Wei et al, VCIP 2017.
Representation Learning for Visual-Relational Knowledge Graphs - Daniel Oñoro-Rubio et al, ARXIV 2017.
Image retrieval using scene graphs - Justin Johnson et al, CVPR 2015.
Generating Semantically Precise Scene Graphs from Textual Descriptions for Improved Image Retrieval - Sebastian Schuster et al, EMNLP 2015.

Classic Papers

Deep Learning of Binary Hash Codes for Fast Image Retrieval - Kevin Lin et al, CVPRW 2015. [code]
Beyond instance-level image retrieval: Leveraging captions to learn a global visual representation for semantic retrieval - Albert Gordo et al, CVPR 2017.

Image Retrieval Datasets

PatternNet : PatternNet: A Benchmark Dataset for Performance Evaluation of Remote Sensing Image Retrieval - Weixun Zhou et al, ISPRS 2018. [download]
Google landmark dataset (GLD) v1 : Large-Scale Image Retrieval with Attentive Deep Local Features - Hyeonwoo Noh et al, ICCV 2017. [download]
Google landmark dataset (GLD) v2 : Google Landmarks Dataset v2 -- A Large-Scale Benchmark for Instance-Level Recognition and Retrieval - Tobias Weyand et al, CVPR 2020. [download]

Other Applications

Semantic Image Manipulation Using Scene Graphs - Helisa Dhamo et al, CVPR 2020.
SOGNet: Scene Overlap Graph Network for Panoptic Segmentation - Yibo Yang et al, AAAI 2020. [code]
ReLaText: Exploiting Visual Relationships for Arbitrary-Shaped Scene Text Detection with Graph Convolutional Networks - Chixiang Ma et al, ARXIV 2020.
Event Detection with Relation-Aware Graph Convolutional Neural Networks - Shiyao Cui et al, ARXIV 2020.
SceneGraphNet: Neural Message Passing for 3D Indoor Scene Augmentation - Yang Zhou et al, ICCV 2019. [code]
Seq-SG2SL: Inferring Semantic Layout from Scene Graph Through Sequence to Sequence Learning - Boren Li et al, ICCV 2019.
PlanIT: planning and instantiating indoor scenes with relation graph and spatial prior networks - Kai Wang et al, TOGS 2019.
Hierarchical Relational Networks for Group Activity Recognition and Retrieval - Mostafa S. Ibrahim et al, ECCV 2018. [code]
Scene Graphs for Interpretable Video Anomaly Classification - Nicholas F. Y. Chen et al, NIPS 2018 ViGIL Workshop.
Learning object interactions and descriptions for semantic image segmentation - Guangrun Wang et al, CVPR 2017.
Multi-Modal Knowledge Representation Learning via Webly-Supervised Relationships Mining - Fudong Nian et al, ACM MM 2017.
Towards a Domain Specific Language for a Scene Graph based Robotic World Model - Sebastian Blumenthal et al, DSLRob 2013.

Workshops

ECCV PIC 2018 Workshop : Person in Context Challenge
ICCV SGRL 2019 Workshop : Scene Graph Representation and Learning
ICCV PIC 2019 Workshop : Person in Context Challenge
ICML LRG 2019 Workshop : Learning and Reasoning with Graph-Structured Representations

Challenges

VRU : ACM MM 2019 Video Relation Understanding (VRU) Challenge - Dataset
PIC : Person in Context Challenge - Dataset - Baseline

Licenses

To the extent possible under law, Youliang Jiang has waived all copyright and related or neighboring rights to this work.

Contact Us

For additional questions of any kind, please feel free to ask away in the issues section or e-mail me at mqjyl2012@163.com!

About

A curated list of scene graph generation and related area resources. :-)