yyimingucl / NER-MEDICAL-QUERY

Extract the entities from medical query

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NER-MEDICAL-QUERY

BILSTM+CRF NER by pytorch main structure from (https://github.com/DengYangyong/medical_entity_recognize)

1. environments

python==3.7

torch==1.6.0

jieba==0.42.1

2.Mark the Sample Data

around 10000 pieces queries from donghuayiwei-jiankangle online medical query platform Marked by Doccano (https://github.com/doccano/doccano)

3.Model Evaulation

There are severe problem at marking data, and the resulting perfomance on dev_set is quite bad (F1-0.565)

4.Layout

model folder store the main structure and CRF layer

sql_file store the original sql file (manipulate by pymysql in NER_data)

Batch.py: batch the train sample with similar length of words

build_vocab.py: bagging the characters of train_sample

predict.py: use the model to predict new inputs

NER_data: prepare and clean data

NER_functions: Used functions

NER_parameters: Used parameters

main: train the model

mark_txt_process: transfer the marked queries produced by doccano to standard training sample

command line performance

image

data from doccano

image

standard training sample

image

5.Prediction

carry out the prediction.py on command line

About

Extract the entities from medical query


Languages

Language:Python 100.0%