prp-e / ultron-old

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Project Ultron

Ultron

What's project Ultron?

Project Ultron is a Question Answering Network (QANet for short) which uses BERT as a backend. For this particular projcet, I user ktrain which is a great interface for Keras.

How to run this project?

Just clone the repository then run:

pip install -r requirements.txt

And wait for the requirements to be installed. Remember in order to get it to work, you have to have tensorflow installed on your system as well. This is tested on Linux, test it on macOS or Windows and give me feedbacks.

NOTE: For the first run, it needs to download a few gigabytes of needed data for BERT. So be patient.

Code explanation

import ktrain
from ktrain import text

These lines are obvious, I just imported what I needed.

INDEXDIR = '/tmp/index_file'; 
input_file = open('input_data.txt')
input_data = [line for line in input_file.readlines()]

In these lines, you can see an index directory which will be made by the code after running. The next is the input file. It's not big enough but it's good enough to show the purpose of the code.

In the third line, we just create a list from that input file. If you have more complex data (such as religious scriptures or really long books or data from google groups ... ), you obviously will need a much better preparation.

text.SimpleQA.initialize_index(INDEXDIR)
text.SimpleQA.index_from_list(input_data, INDEXDIR, commit_every=len(input_data), multisegment=True, procs=4, breakup_docs=True)

In these lines, we just make the brain.

qa = text.SimpleQA(INDEXDIR)
answers = qa.ask("Who are you?")
for answer in answers[:5]:
    print(answer)

Here, Ultron will answer you nicely.

If you need a better explanation, just read this notebook.

About


Languages

Language:Python 100.0%