chenghuige / vilio

🥶Vilio: State-of-the-art VL models in PyTorch & PaddlePaddle

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool


🥶VILIO🥶


Build GitHub release Transformers Documentation Contributor Covenant

State-of-the-art Visio-Linguistic Models 🥶

Updates

06/2021 - Hateful Memes CSV Files

  • The CSV files that were used for the scores in the vilio paper are now available here

06/2021 - Inference on any meme

Ordering

Vilio aims to replicate the organization of huggingface's transformer repo at: https://github.com/huggingface/transformers

  • /bash Shell files to reproduce hateful memes results

  • /data By default, directory for loading in data & saving checkpoints

  • /ernie-vil Ernie-vil sub-repository written in PaddlePaddle

  • /fts_lmdb Scripts for handling .lmdb extracted features

  • /fts_tsv Scripts for handling .tsv extracted features

  • /notebooks Jupyter Notebooks for demonstration & reproducibility

  • /py-bottm-up-attention Sub-repository for tsv feature extraction forked & adapted from here

  • src/vilio All implemented models (also see below for a quick overview of models)

  • /utils Pandas & ensembling scripts for data handling

  • entry.py files Scripts used to access the models and apply model-specific data preparation

  • pretrain.py files Same purpose as entry files, but for pre-training; Point of entry for pre-training

  • hm.py Training code for the hateful memes challenge; Main point of entry

  • param.py Args for running hm.py

Usage

Follow SCORE_REPRO.md for reproducing performance on the Hateful Memes Task.
Follow GETTING_STARTED.md for using the framework for your own task.
See the paper at: https://arxiv.org/abs/2012.07788

Architectures

🥶 Vilio currently provides the following architectures with the outlined language transformers:

  1. E - ERNIE-VIL ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph
  2. D - DeVLBERT DeVLBert: Learning Deconfounded Visio-Linguistic Representations
  3. O - OSCAR Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
  4. U - UNITER UNITER: UNiversal Image-TExt Representation Learning
  5. V - VisualBERT VisualBERT: A Simple and Performant Baseline for Vision and Language
  6. X - LXMERT LXMERT: Learning Cross-Modality Encoder Representations from Transformers

To-do's

  • Clean-up import statements, python paths & find a better way to integrate transformers (Right now all import statements only work if in main folder)
  • Enable loading and running models just via import statements (and not having to clone the repo)
  • Find a way to better include ERNIE-VIL in this repo (PaddlePaddle to Torch?)
  • Move tokenization in entry files to model-specific tokenization similar to transformers

Attributions

The code heavily borrows from the following repositories, thanks for their great work:

About

🥶Vilio: State-of-the-art VL models in PyTorch & PaddlePaddle

License:MIT License


Languages

Language:Python 57.5%Language:Jupyter Notebook 40.0%Language:Cuda 1.2%Language:Shell 0.7%Language:C++ 0.6%Language:Dockerfile 0.0%Language:Makefile 0.0%