This is the repo for "MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their Targets" accepted at Findings of EMNLP '21.
setting up dependencies
if CUDA_version == "10.0":
torch_version_suffix = "+cu100"
elif CUDA_version == "10.1":
torch_version_suffix = "+cu101"
elif CUDA_version == "10.2":
torch_version_suffix = ""
else:
torch_version_suffix = "+cu110"
For installing CLIP
! pip3 install torch==1.7.1{torch_version_suffix} torchvision==0.8.2{torch_version_suffix} -f https://download.pytorch.org/whl/torch_stable.html ftfy regex --user
! wget https://openaipublic.azureedge.net/clip/bpe_simple_vocab_16e6.txt.gz -O bpe_simple_vocab_16e6.txt.gz
For sentence transformer: Follow steps from https://github.com/UKPLab/sentence-transformers
The .py contains the exhaustive set of steps required to be run in sequence.
- It contains code for loading pre-saved ROI and entity features, which can be loaded if available.
- Otherwise the code for extracting features on-demand is also included.
- For initializing dataset and data loader for pytorch: Load the data-set for training and testing as per the requirement of the run.
- Experimental settings:
Configurations for the binary/multi-class setting (training/testing/evaluation) has to be considered as per the requirement, code blocks for which are provided and suitably commented out.
Please note: TWO versions of Harm-P data for "Harmfulness" are provided as part of HarMeme-V0 and HarMeme-V1, respectively. We recommend using HarMeme-V1 for updated and correct version for "Harmfulness" data.
- HarMeme-V0: CAUTION! OBSOLETE FOR HARM-P "Harmfulness" - Contains duplicates in Harm-P. Thanks to mingshanhee and uprihtness for pointing out the discrepancies. See the upgraded version (V1) below for the deduplicated version of Harm-P/C (Harmfulness) data. HarMeme-V0 content (including Target data for HarMeme-V0) can be accessed via the following links:
- HarMeme-V1: Updated + Complete Version. Check the folder named: "HarMeme_V1" in this repo for data files. Please refer Harm-P (US Politics), Harm-C (Covid-19) links for meme images . For additional details about HarMeme-V1, refer the README in it's repo folder. The repo folder contains the following:
- Annotations (Same format as V0: [id, image, labels, text]), but complete set.
- Meta-info (Collected using GCV API): Meme id, OCR Text, Web Entities, Best labels, Titles, Objects, ROI Info.