MaheshJadhav1985 / generate_boolean_questions_using_T5_transformer

Generating boolean (yes/no) questions from any content using T5 text-to-text transformer model and BoolQ dataset

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Generating boolean (yes/no) questions from any content using T5 text-to-text transformer model and BoolQ dataset

Using this program you can generate boolean (yes/no) questions from any content.

A detailed Medium blogpost explaining necessary steps can be found here.

Input

The input to our program will be any content/paragraph -

Months earlier, Coca-Cola had begun “Project Kansas.” It sounds like a nuclear experiment but it was just a testing project for the new flavor. In individual surveys, they’d found that more than 75% of respondents loved the taste, 15% were indifferent, and 10% had a strong aversion to the taste to the point that they were angry.

Ouput

The output will be boolean (yes/no) questions generated from the above input.

Boolean (yes/no) questions generated from the T5 Model :

1: Does coca cola have a kansas flavor?
2: Is project kansas a new coca cola flavor?
3: Is project kansas the same as coca cola?

Inference code

The t5_inference.py file has all the code to run the model on any given paragraph.

Training the model

The training and validation datasets are present in the boolq_data folder.

Install the necessary libraries from requirements.txt.

Use any GPU machine and run train.py

Training this model for 4 epochs (default) took about 5-6 hrs on p2.xlarge (AWS ec2).

Note that since the dataset is small I barely used the validation set.

Also not all the questions generated by model are of high quality because of small training dataset it is trained on.

About

Generating boolean (yes/no) questions from any content using T5 text-to-text transformer model and BoolQ dataset


Languages

Language:Python 72.0%Language:Jupyter Notebook 28.0%