UniversalDependencies / UD_Japanese-KTC

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

# UD_Japanese-KTC

This treebank is produced by automatically converting Kaede treebank,
which is built over the source material of Kyoto corpus.
http://nlp.ist.i.kyoto-u.ac.jp/EN/index.php?Kyoto%20University%20Text%20Corpus


## Recovering data

The data is provided in the CoNLL format, but original texts are
stripped off due to the license issue.  In order to recover the data
with original texts, you need the corpus of Mainichi Shinbun 1995 (the
same set of data for Kyoto Corpus).  The Mainichi Shinbun data can be
obtained from Nichigai Associates:

http://www.nichigai.co.jp/sales/mainichi/mainichi-data.html

The corpus is obtained by running the following command:

./autoconv -d MAINICHI_DIR

where `MAINICHI_DIR` denotes the directory of the files of Mainichi
Shinbun 1995.  In order to run this command, you need Perl and
Python.

The program for extracting texts from Mainichi Shinbun 1995 is
borrowed from the Kyoto Corpus project.

## Spliting

Each data set contains UD annotations for the following sections in Kyoto Corpus.

training: articles from January 1st to 4th and editorials from January to February
development: articles on January 6th and editorials in April
test: articles on January 5th and editorials in March

## Citation

You are encouraged to cite the following paper when you refer to the
Universal Dependencies Japanese Treebank.

Hiroshi Kanayama, Yusuke Miyao, Takaaki Tanaka, Shinsuke Mori,
Masayuki Asahara, and Sumire Uematsu.  A Draft Proposal for Universal
Dependencies Japanese Treebank.  In the Proceedings of the 21st annual
meeting for Gengo Shori Gakkai (The Association for Natural Language
Processing).  2015.  (In Japanese)

## Changelog

* 2015-11-15 v1.2
  * First release in UD

=== Machine-readable metadata =================================================
Data available since: UD v1.2
License: CC BY-SA 4.0
Includes text: no
Genre: news
Lemmas: converted from manual
UPOS: converted from manual
XPOS: not available
Features: converted from manual
Relations: converted from manual
Contributors: Asahara, Masayuki; Kanayama, Hiroshi; Matsumoto, Yuji; Miyao, Yusuke; Mori, Shunsuke; Tanaka, Takaaki; Uematsu, Sumire
Contributing: elsewhere
Contact: hkana@jp.ibm.com
===============================================================================

About

License:Other


Languages

Language:Perl 6 60.0%Language:Perl 22.5%Language:Python 10.1%Language:Shell 7.3%