thuan00 / drqa-transformers

Some work on my NLP course at school

Home Page:https://github.com/hoangvuduyanh33/QA

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

About this work

This is a work on a NLP course at our school, in particular, on open-domain question answering.

Our approach

  • The baseline is DrQA as suggested by the instructor. About DrQA, please refer to DrQA's official repo for more information (paper, intr, ciation, license, ...)
  • Improve the retrieval stage with better schemes (researching)
  • Leverage Huggingface transformers framework, with better models such as BERT.
  • Apply the methods to our Vietnamese language.

Report summary

Pipeline: en

Pipeline Open SQuAD-dev (EM/F1)
DrQA-biLSTM 29.5 / -
DrQA-transformers 31.9 / 36.9
pyserini-transformers 37.3 / 43.9

with transformers model being used as distilbert-base-cased-distilled-squad

Vi readers

Data Model Params Throughput vi-wiki-test MLQA-dev
SQuAD-translate (~100k pairs) PhoBERT-base 135M 17.6/s 45.0 / 63.6 37.6 / 57.2
XLM-R-base 270M 15.1/s 45.9 / 65.5 40.9 / 59.8
MLQA + XQuAD (~7000 pairs) XLM-R-base 270M 15.1/s 52.3 / 67.0 44.4 / 64.5
XLM-R-large 550M 4.9/s 60.4 / 73.9 51.1 / 70.4

Installation

Usage

Interactive

pyserini-transformers: vietnamese

python scripts\pipeline_transformers\interactive.py  
  --reader-model <path to model folder or Huggingface model name> \
  --retriever pyserini-bm25 \
  --index-path <path to index folder> \
  --index-lan vi  \
  --num-workers 4 

Web UI

At drqa-webui submodule

Showcase Sample question

Future work

Still, there are a lot to improve, so many new novel methods and ideas to implement

About

Some work on my NLP course at school

https://github.com/hoangvuduyanh33/QA

License:Other


Languages

Language:Python 94.1%Language:Jupyter Notebook 4.4%Language:Shell 1.5%