trhgquan / DemoViHealthBERT-NER

Demo for NER task of ViHealthBERT - Text Mining (MTH089)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Demo ViHealthBERT for NER task

This is a demo repository for Text Mining (MTH089) @ VNUHCM - University of Science, Winter 2022.

Shout out to Pham Anh Viet, Nguyen Thien Duong and Nguyen Duc Thuan for their great contributon on this project.

Abstract

Based on the paper ViHealthBERT: Pre-trained Language Models for Vietnamese in Health Text Mining1, we fine-tuned the ViHealthBERT model for the NER task using the PhoNER_COVID19 dataset2.

Setup

For the server

1. Install Java

Follow the guidelines in https://gist.github.com/wavezhang/ba8425f24a968ec9b2a8619d7c2d86a6 to download Java without an Oracle account.

2. Install VnCoreNLP

Run vncorenlp.sh to download the pretrained VnCoreNLP,

3. Install other packages

pip install -r requirements.txt

4. Download the fine-tuned ViHealthBERT-NER pretrain

Follow this link. The directory tree after this step should looks something like this:

DemoViHealthBERT-NER
|   .gitignore
|   main.py
|   data_loader.py
|   readme.md
|   requirements.txt
|   vncorenlp.sh
+---model
|       module.py
|       vihnbert.py
+---model-save
|       config.json
|       pytorch_model.bin
|       training_args.bin
|
+---web-demo-ner
|
\---vncorenlp
    |   VnCoreNLP-1.1.1.jar
    |
    \---models
        \---wordsegmenter
                vi-vocab
                wordsegmenter.rdr

Everything should be fine after this section. Now run and remember the server's url.

python main.py

The server's url can be found on the terminal:

Running on http://ipaddress:port (Press Ctrl+C to quit)

For the web application

1. Install Flutter SDK

https://docs.flutter.dev/get-started/install

2. Configure API server address

  • Open web-demo-ner/lib/data/predict_ner_remote_data_source.dart
  • Update serverUrl to the http://ipaddress:port above.

3. Running the Web

  • List out all available devices:
    flutter devices
    
  • Using one of the devices above (e.g. edge):
    cd web-demo-ner
    flutter run -d edge
    

The web should start after a while.

Footnotes

  1. ViHealthBERT: Pre-trained Language Models for Vietnamese in Health Text Mining (Minh et al., LREC 2022)

  2. COVID-19 Named Entity Recognition for Vietnamese (Truong et al., NAACL 2021)

About

Demo for NER task of ViHealthBERT - Text Mining (MTH089)


Languages

Language:Dart 80.3%Language:Python 16.7%Language:HTML 2.4%Language:Shell 0.6%