leaderj1001 / Vision-Language

Vision-Language, Solve GQA(Visual Reasoning in the Real World) dataset.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GQA: Visual Reasoning in the Real World

Data structure

├── Question Number
    ├── Annotations
    |   ├── answer
    |   ├── full Answer
    |   └── question
    │   
    ├── answer
    ├── entailed
    ├── equivalent
    ├── fullAnswer
    ├── groups
    ├── imageId
    ├── isBalanced
    ├── question
    ├── semantic
    ├── semanticStr
    └── types
        ├── detailed
        ├── semantic
        └── structural
  • answer
  • imageId
  • question

Network Architecture

캡처

Image-Question Aggregator

캡처2

Requirements

  • tensorflow-gpu==1.13.1
  • numpy==1.16.2
  • tensorflow-hub==0.4.0
  • python==3.7.3
  • cv2==4.0.0
  • tqdm==4.31.1

About

Vision-Language, Solve GQA(Visual Reasoning in the Real World) dataset.


Languages

Language:Python 63.5%Language:Jupyter Notebook 36.5%