This is a demo repository for Text Mining (MTH089) @ VNUHCM - University of Science, Winter 2022.
Shout out to Pham Anh Viet, Nguyen Thien Duong and Nguyen Duc Thuan for their great contributon on this project.
Based on the paper ViHealthBERT: Pre-trained Language Models for Vietnamese in Health Text Mining1, we fine-tuned the ViHealthBERT model for the NER task using the PhoNER_COVID19 dataset2.
Follow the guidelines in https://gist.github.com/wavezhang/ba8425f24a968ec9b2a8619d7c2d86a6 to download Java without an Oracle account.
- JDK / JRE version required: at least 1.8
- Remember to add Java runtime to your system variable - Windows only!
Run vncorenlp.sh
to download the pretrained VnCoreNLP,
- Or, follow the guidelines in https://github.com/vncorenlp/VnCoreNLP
pip install -r requirements.txt
Follow this link. The directory tree after this step should looks something like this:
DemoViHealthBERT-NER
| .gitignore
| main.py
| data_loader.py
| readme.md
| requirements.txt
| vncorenlp.sh
+---model
| module.py
| vihnbert.py
+---model-save
| config.json
| pytorch_model.bin
| training_args.bin
|
+---web-demo-ner
|
\---vncorenlp
| VnCoreNLP-1.1.1.jar
|
\---models
\---wordsegmenter
vi-vocab
wordsegmenter.rdr
Everything should be fine after this section. Now run and remember the server's url.
python main.py
The server's url can be found on the terminal:
Running on http://ipaddress:port (Press Ctrl+C to quit)
https://docs.flutter.dev/get-started/install
- Open
web-demo-ner/lib/data/predict_ner_remote_data_source.dart
- Update
serverUrl
to thehttp://ipaddress:port
above.
- List out all available devices:
flutter devices
- Using one of the devices above (e.g.
edge
):cd web-demo-ner flutter run -d edge
The web should start after a while.
Footnotes
-
ViHealthBERT: Pre-trained Language Models for Vietnamese in Health Text Mining (Minh et al., LREC 2022) ↩
-
COVID-19 Named Entity Recognition for Vietnamese (Truong et al., NAACL 2021) ↩