pvl / wikihow_pairs_dataset

Dataset extracted from WikiHow with pairs of similar sentences

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

WikiHow dataset for sentence similarity

This dataset is extracted from WikiHow with goal of making matching pairs of similar sentences.

For more information on the original WikiHow dataset check the repository and the paper on https://arxiv.org/abs/1810.09305. The source file need to run the extraction script is wikihowSep.csv and can be downloaded from this link below:

https://ucsb.box.com/s/7yq601ijl1lzvlfu4rjdbbxforzd2oag

To generate the jsonl file with pairs of texts using the python script in this repository, first install pandas and spacy and then run:

$ python extract.py <path to wikihowSep.csv> --output wikihow.jsonl

The compressed output file is also included in the repository.

About

Dataset extracted from WikiHow with pairs of similar sentences

License:Apache License 2.0


Languages

Language:Python 100.0%