yobcmst / Bilingual_Medical_and_Cancer_Vocabulary

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TODO

  • [ChineseBLUE]: Basic preprocessor for CMedQANER & cEHRNER, python src/chineseBlue_ner.py
  • [ChineseBLUE]: cMedQANER (zh-tw) + (en), save to json file
  • [CHIP2021-Task1]: Added CHIP2021-Task1-Top1 data; see data_df['label_b'].unique()

Data Source (1): ChineseBLUE

|─── cMedQANER
|    ├── cMedQANER.tar.gz
|    ├── dev.json
|    ├── dev.txt
|    ├── test.json
|    ├── test.txt
|    ├── train.json
|    └── train.txt
├── cEHRNER
|    ├── cEHRNER.tar.gz
|    ├── dev.json
|    ├── dev.txt
|    ├── test.json
|    ├── test.txt
|    ├── train.json
|    └── train.txt

Data Source (2): CHIP2021

    git clone https://github.com/DataArk/CHIP2021-Task1-Top1
    python src/ckip2021_task1_top1.py

Data Source (3): Medical_Word

    see processed_data/med_words.json (To be filtered by NER module.)

    Or

    python src/med_word.py

About


Languages

Language:Python 100.0%