kambehmw / wikilinks-jp

Article link-based Wikification for SHINRA2021-LinkJP

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Wikilinks JP

Wikilinks ― a Wikification linker for SHINRA2021-LinkJP Task.

Wikilinks uses a simple, article link-based approach.

usami

Usage

Clone this repository first.

$ git clone https://github.com/usami/wikilinks-jp.git

Then running ./do link-sample downloads the sample data, builds the linker and runs the linker against the sample data.

$ cd wikilinks-jp
$ ./do link-sample
...
2021/05/07 10:47:02 linker[airport]: load annotaions
2021/05/07 10:47:02 linker[airport]: load pages
2021/05/07 10:47:02 linker[airport]: load title to pageid mappings
2021/05/07 10:47:08 linker[airport]: check links
2021/05/07 10:47:08 linker[airport]: output analyzed results
...
2021/05/07 10:47:23 linker[person]: load annotaions
2021/05/07 10:47:23 linker[person]: load pages
2021/05/07 10:47:23 linker[person]: load title to pageid mappings
2021/05/07 10:47:28 linker[person]: check links
2021/05/07 10:47:28 linker[person]: output analyzed results

Example outputs can be found under output/sample.

$ ls output/sample/
airport.json  city.json  company.json  compound.json  person.json

The linker can be used as a CLI tool.

Usage: ./bin/linker [category] [annotation-file] [html-dir] [title-pageid-file] [output-file]

Requirements

Softwares

The linker and do script assumes the following commands are installed:

  • go (1.16)
  • curl
  • unzip
  • gunzip

Files

The linker requires Wikipedia title to pageid mapping file. A mapping file is bandled with this repo (data/jawiki-20190120-title2pageid.json.gz). You can download the latest version here.

About

Article link-based Wikification for SHINRA2021-LinkJP

License:MIT License


Languages

Language:Python 45.9%Language:Go 41.0%Language:Shell 13.1%