malteos / semantic-document-relations

Implementation, trained models and result data for the paper "Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles"

Home Page:https://arxiv.org/abs/2003.09881

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Semantic Relations between Wikipedia Articles

Open In Colab DOI

Implementation, trained models and result data for the paper Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles (PDF on Arxiv). The supplemental material is available for download under GitHub Releases or Zenodo.

Wikipedia Relations

Getting started

Requirements:

  • Python >= 3.7 (Conda)
  • Jupyter notebook (for evaluation)
  • GPU with CUDA-support (for training Transformer models)

At first we advise to create a new virtual environment for Python 3.7 with Conda:

conda create -n docrel python=3.7
conda activate docrel

Install all Python dependencies:

pip install -r requirements.txt

Download dataset (and pretrained models):

# Navigate to data directory
cd data

# Wikipedia corpus
# - download
wget https://github.com/malteos/semantic-document-relations/releases/download/1.0/enwiki-20191101-pages-articles.weighted.10k.jsonl.bz2

# - decompress 
bzip2 -d enwiki-20191101-pages-articles.weighted.10k.jsonl.bz2

# Train and test data
# - download
wget https://github.com/malteos/semantic-document-relations/releases/download/1.0/train_testdata__4folds.tar.gz

# - decompress
tar -xzf train_testdata__4folds.tar.gz

# Models
# - download
wget https://github.com/malteos/semantic-document-relations/releases/download/1.0/model_wiki.bert_base__joint__seq512.tar.gz

# - decompress
tar -xzf model_wiki.bert_base__joint__seq512.tar.gz

Experiments

Run predefined experiment (settings can be found in experiments/predefined/wiki)

# Config: wiki.bert_base__joint__seq128
# GPU ID: 1 (set via CUDA_VISIBLE_DEVICES=1)
# Output dir: ./output
python cli.py run ./output 1 wiki.bert_base__joint__seq512

Demo

You can run a Jupyter notebook on Google Colab:

Open In Colab

How to cite

If you are using our code, please cite our paper:

@InProceedings{Ostendorff2020,
  title = {Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles},
  booktitle = {Proceedings of the {ACM}/{IEEE} {Joint} {Conference} on {Digital} {Libraries} ({JCDL})},
  author = {Ostendorff, Malte and Ruas, Terry and Schubotz, Moritz and Gipp, Bela},
  year = {2020},
  month = {Aug.},
}

See also

License

MIT

About

Implementation, trained models and result data for the paper "Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles"

https://arxiv.org/abs/2003.09881

License:MIT License


Languages

Language:Python 79.2%Language:Jupyter Notebook 20.8%