aisyahrzk / question-generation

finetune t5 small model for malay question generation task

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

T5 Finetuning for Bahasa Melayu Question Generator

This repository contains code and resources for finetuning the T5 model from malaya t5-small-bahasa model to create an answer-agnostic question generator for Bahasa Melayu. The goal of this project is to finetune a T5 model that can generate meaningful and contextually relevant questions for a given text passage in Bahasa Melayu, without relying on specific answers.

Dataset

The model is finetune using translated SQUAD bahasa melayu dataset from this repo

The model is able to generate questions without providing the answers. It is trained to generate multiple questions simultaneously by just providing the context. The questions are seperated by the token. Here's an example of the data the model is trained on:

input text:
generate questions: Isaac Newton (1643-1727) mewarisi konsepsi mekanikal Descartes tentang jirim. Dalam ketiga "Rules of Reasoning in Philosophy" beliau, Newton menyenaraikan sifat-sifat sejagat jirim sebagai "sambungan, kekerasan, kebolehpercayaan, mobiliti, dan inersia". Begitu juga dalam Optik dia menyangkal bahawa Tuhan mencipta jirim sebagai "zarah pepejal, besar, keras, tidak dapat ditembusi, boleh bergerak", yang "... walaupun begitu keras sehingga tidak pernah memakai atau memecahkan kepingan". Sifat-sifat "primer" jirim telah dipinda pada keterangan matematik, tidak seperti sifat-sifat "sekunder" seperti warna atau rasa. Seperti Descartes, Newton menolak sifat penting sifat sekunder. </s>

target text:
Bilakah Descartes dilahirkan? <sep> Apa yang ditulis oleh Descartes? <sep> Apa yang ditolak oleh Newton yang Descartes tidak? <sep> Apa yang dikatakan Descartes adalah sifat-sifat universal jirim? <sep> Kedua-dua sifat primer dan sekunder sesuai dengan bentuk keterangan apa? <sep>

Inference

To generate questions using the model, follow example from question_generator notebook.

Model

Access the model here:Link

Reference:

About

finetune t5 small model for malay question generation task


Languages

Language:Jupyter Notebook 77.5%Language:Python 22.5%