shiv6891 / MOMENTA

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MOMENTA

This is the repo for "MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their Targets" accepted at Findings of EMNLP '21.

setting up dependencies

if CUDA_version == "10.0":
    torch_version_suffix = "+cu100"    
elif CUDA_version == "10.1":
    torch_version_suffix = "+cu101"    
elif CUDA_version == "10.2":
    torch_version_suffix = ""    
else:
    torch_version_suffix = "+cu110"

For installing CLIP

! pip3 install torch==1.7.1{torch_version_suffix} torchvision==0.8.2{torch_version_suffix} -f https://download.pytorch.org/whl/torch_stable.html ftfy regex --user
! wget https://openaipublic.azureedge.net/clip/bpe_simple_vocab_16e6.txt.gz -O bpe_simple_vocab_16e6.txt.gz

For sentence transformer: Follow steps from https://github.com/UKPLab/sentence-transformers

Instructions

The .py contains the exhaustive set of steps required to be run in sequence.

  1. It contains code for loading pre-saved ROI and entity features, which can be loaded if available.
  2. Otherwise the code for extracting features on-demand is also included.
  3. For initializing dataset and data loader for pytorch: Load the data-set for training and testing as per the requirement of the run.
  4. Experimental settings:
    Configurations for the binary/multi-class setting (training/testing/evaluation) has to be considered as per the requirement, code blocks for which are provided and suitably commented out.

Dataset, Features and Meta-info:

Please note: TWO versions of Harm-P data for "Harmfulness" are provided as part of HarMeme-V0 and HarMeme-V1, respectively. We recommend using HarMeme-V1 for updated and correct version for "Harmfulness" data.

  1. HarMeme-V0: CAUTION! OBSOLETE FOR HARM-P "Harmfulness" - Contains duplicates in Harm-P. Thanks to mingshanhee and uprihtness for pointing out the discrepancies. See the upgraded version (V1) below for the deduplicated version of Harm-P/C (Harmfulness) data. HarMeme-V0 content (including Target data for HarMeme-V0) can be accessed via the following links:
  2. HarMeme-V1: Updated + Complete Version. Check the folder named: "HarMeme_V1" in this repo for data files. Please refer Harm-P (US Politics), Harm-C (Covid-19) links for meme images . For additional details about HarMeme-V1, refer the README in it's repo folder. The repo folder contains the following:
    • Annotations (Same format as V0: [id, image, labels, text]), but complete set.
    • Meta-info (Collected using GCV API): Meme id, OCR Text, Web Entities, Best labels, Titles, Objects, ROI Info.

About

License:MIT License


Languages

Language:Python 100.0%