anjbapat / D2T

Text generation from structured data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

D2T

Text generation from structured data

This is the project for automatically generating text summary given NBA game box-score.

Requirements

  1. Python 3.6
  2. PyTorch 0.2

Data set

We use Rotowire dataset for training as in Challenges in Data-to-Document Generation (Wiseman, Shieber, Rush; EMNLP 2017). This dataset consists of (human-written) NBA basketball game summaries aligned with their corresponding box- and line-scores.

Basic Usage

Data: Rotowire data Can be found in folder Data/ rotowire.tar.bz2

To Extract:

filename='rotowire.tar.bz2'

import tarfile

tar= tarfile.open(filename,mode='r')

tar.extractall()

tar.close()

Train

In train/ directory is the part of data2text generation. The files are for this part include:

  • dataprepare.py -- word2index map class, storing the vocabulary and relations
  • model.py -- The file that contains the implementation of several encoder, decoder and embedding model class
  • preprocessing.py -- mainly for read of parse the data
  • train.py -- the file that defines the training processes
  • util.py -- utility functions for time, showing etc.
  • setting.py -- store the hyper-parameter, file location etc.
  • train1.py- Contains model_initialization

Some pre-trained model files could be found here.

To Evaluate:

  • small_evaluate.py – Generating some text

Some output files can be found here.

  • Jupyter Notebook- Result_run.ipynb

References: Thanks to the dataset and code from Wiseman et. al.s

Data Source: https://github.com/harvardnlp/boxscore-data https://github.com/harvardnlp/data2text Papers Referred:

About

Text generation from structured data


Languages

Language:Jupyter Notebook 87.5%Language:Python 12.5%