monpa-team / monpa

MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

分詞錯誤:AssertionError: lengths ['名嘴'] (words) and ['Na', 'VI'] (pos tags) mismatch

tingjhenjiang opened this issue · comments

我在分詞的時候不知道為什麼出現這個錯誤
在此回報,請測試看看

Traceback (most recent call last):
File "monpaseg_from_sql.py", line 45, in
stmts = stmts.compute()
File "C:\ProgramData\Miniconda3\envs\tensorflow\lib\site-packages\dask\base.py", line 175, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "C:\ProgramData\Miniconda3\envs\tensorflow\lib\site-packages\dask\base.py", line 446, in compute
results = schedule(dsk, keys, **kwargs)
File "C:\ProgramData\Miniconda3\envs\tensorflow\lib\site-packages\dask\threaded.py", line 82, in get
**kwargs
File "C:\ProgramData\Miniconda3\envs\tensorflow\lib\site-packages\dask\local.py", line 491, in get_async
raise_exception(exc, tb)
File "C:\ProgramData\Miniconda3\envs\tensorflow\lib\site-packages\dask\compatibility.py", line 130, in reraise
raise exc
File "C:\ProgramData\Miniconda3\envs\tensorflow\lib\site-packages\dask\local.py", line 233, in execute_task
result = execute_task(task, data)
File "C:\ProgramData\Miniconda3\envs\tensorflow\lib\site-packages\dask\core.py", line 119, in execute_task
return func(*args2)
File "monpaseg_from_sql.py", line 27, in generate_update_monpa_stmt
seg_newscontent = monpa_seg(row_contains_title_and_content[2], config_filepath=commonvar_file_path)
File "E:\Software\scripts\python\ML_DL_final\segmentation_functions.py", line
39, in monpa_seg
cuttedwords = [monpa.cut(sent) for sent in sents]
File "E:\Software\scripts\python\ML_DL_final\segmentation_functions.py", line
39, in
cuttedwords = [monpa.cut(sent) for sent in sents]
File "C:\ProgramData\Miniconda3\envs\tensorflow\lib\site-packages\monpa_init
.py", line 410, in cut
return cut_w_userdict(text)
File "C:\ProgramData\Miniconda3\envs\tensorflow\lib\site-packages\monpa_init
.py", line 327, in cut_w_userdict
conll_formatted, segmented_words, pos_tags = to_CoNLL_format(sentence, model_out)
File "C:\ProgramData\Miniconda3\envs\tensorflow\lib\site-packages\monpa_init_.py", line 210, in to_CoNLL_format
(segmented_words), (pos_tags))
AssertionError: lengths ['名嘴'] (words) and ['Na', 'VI'] (pos tags) mismatch

單詞『名嘴』斷詞沒有錯誤,方便給要做斷詞的全文以利偵錯嗎?

please feel free to try new release (v0.2.5.1).

原本的問題是怎麼發生的,詳細過程不太清楚,我是把資料庫中所有的文字全部都用monpa分詞,而資料庫中有些文字包含「名嘴」一詞,而我的自定義辭典沒有包含「名嘴」的詞
這幾天測試包含「名嘴」的文字內容全都已經分詞完畢沒有問題了

謝謝