tongchangD / bert_ner_for_corrector

基于NER的文本纠错

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

请问怎么进行测试效果

hexianbin1994 opened this issue · comments

尝试了python test.py ,但dataset里得文件目录和结构都不对,可以更新到最新版本的吗?

@hexianbin1994 ,晚点我重新搭建程序试试,可能是老代码导致无法使用了,我尽快确定好了再更新一次

@hexianbin1994 粗劣看了一下 test.py 就是一个缺一个词表,你根据readme中制作数据集的方法制作数据集再训练模型再就可以测试了

看别人代码记得先看看readme,再运行,谢谢

可将data.py 下 build_corpus 函数修改如下:

def build_corpus(split, make_vocab=True, data_dir="./dataset/old"):
    """读取数据"""
    assert split in ['train', 'dev', 'test']
    word_lists = []
    tag_lists = []
    with open(join(data_dir, split+".char.txt"), 'r', encoding='utf-8') as f:
        word_list = []
        tag_list = []
        for line in f:
            line = line.replace("  "," ")
            if line.strip() != '':
                word,tag= line.strip('\n').split(" ")[0],line.strip('\n').split(" ")[1]
                word_list.append(word)
                tag_list.append(tag)
            else:
                word_lists.append(word_list)
                tag_lists.append(tag_list)
                word_list = []
                tag_list = []
    # 如果make_vocab为True,还需要返回word2id和tag2id
    if make_vocab:
        word2id = build_map(word_lists)
        tag2id = build_map(tag_lists)
        return word_lists, tag_lists, word2id, tag2id
    else:
        return word_lists, tag_lists