Bag-of-Words vs. Graph vs. Sequence in Text Classification: Questioning the Necessity of Text-Graphs and the Surprising Strength of a Wide MLP

Code for the experiments

Update: Paper accepted for ACL 2022.

If you use this code for your research, please consider citing:

@inproceedings{galke-scherp-2022-widemlp,
    title = "Bag-of-Words vs. Graph vs. Sequence in Text Classification: Questioning the Necessity of Text-Graphs and the Surprising Strength of a Wide MLP",
    author = "Galke, Lukas  and Scherp, Ansgar",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    year = "2022",
    address = "Online",
    publisher = "Association for Computational Linguistics",
}

Get up and running

Download the data folder from the TextGCN repository (forked for archival purposes) and make sure that the data is placed into a subfolder ./data in the exact same directory structure.
Check for static paths such as CACHE_DIRin run_text_classification.py and the path to GloVe vectors in the /experiments dir
Install dependencies via pip install -r requirements.txt, preferably in a virtual environment.

Code overview

In models.py, you find our implementation for the SimpleMLP.
In data.py, you find the load_data() function which, does the data loading. Valid datasets are: `[ '20ng', 'R8', 'R52', 'ohsumed', 'mr']
In tokenization.py you find our tokenizer implementation for the GloVe model. For other models, use BERT's tokenizer and vocab
The code contains some artefacts from creating a textgraph ourselves, but in the end we did not run own experiments that needed it.

Running experiments

The script run_text_classification.py is the main entry point for running an experiment. Train test split is used as-is from the datasets of TextGCN. Within the experiments folder, you find the bash scripts that we used for the experiments from the paper.

anttttti / text-clf-baselines

Bag-of-Words vs. Graph vs. Sequence in Text Classification: Questioning the Necessity of Text-Graphs and the Surprising Strength of a Wide MLP

Get up and running

Code overview

Running experiments

About

Languages