ewrfcas / LightHanlp2_pytorch

基于pytorch的轻量级hanlp2.0工具,支持中文分词,词性分类,实体抽取,句法分析,语义分析

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LightHanlp2 pytorch

基于pytorch的轻量级hanlp2工具,支持中文分词,词性分类,实体抽取,句法分析,语义分析

感谢原项目作者的贡献https://github.com/hankcs/HanLP

版本依赖

pytorch >= 1.2.0

注意

本项目指在不依赖于tensorflow2.0轻便地调用hanlp2的模型,方便初心者理解各个工具的基本作用机理。另一个理由是个人使用hanlp2的时候存在内存溢出的问题,所以想用自己熟悉的结构来调用。 并不提供训练等复杂功能(没有优化器配置,模型中没有配置dropout层),完整功能请使用原hanlp2(https://github.com/hankcs/HanLP)。

模型下载地址

链接:https://pan.baidu.com/s/1KFElopnwpEYbO6PpoOUscQ 密码:khgs

授人以鱼不如授人以渔

可以参考light_hanlp/utils/convert_keras_to_pytorch.py,将keras转化为pytorch模型。

案例

python examples.py
inputs:
HanLP是一系列模型与算法组成的自然语言处理工具包,目标是普及自然语言处理在生产环境中的应用。
HanLP具备功能完善、性能高效、架构清晰、语料时新、可自定义的特点。
内部算法经过工业界和学术界考验,配套书籍《自然语言处理入门》已经出版。
上海华安工业(集团)公司董事长谭旭光和秘书张晚霞来到美国纽约现代艺术博物馆参观。
萨哈夫说,伊拉克将同联合国销毁伊拉克大规模杀伤性武器特别委员会继续保持合作。
HanLP支援臺灣正體、香港繁體,具有新詞辨識能力的中文斷詞系統
蜡烛两头烧
####################################################################################################


加载CWS模型...
missing keys:[]
unexpected keys:[]
error msgs:[]
['HanLP', '是', '一', '系列', '模型', '与', '算法', '组成', '的', '自然', '语言', '处理', '工具包', ',', '目标', '是', '普及', '自然', '语言', '处理', '在', '生产', '环境', '中', '的', '应用', '。']
['HanLP', '具备', '功能', '完善', '、', '性能', '高效', '、', '架构', '清晰', '、', '语料', '时', '新', '、', '可自', '定义', '的', '特点', '。']
['内部', '算法', '经过', '工业界', '和', '学术界', '考验', ',', '配套', '书籍', '《', '自然', '语言', '处理', '入门', '》', '已经', '出版', '。']
['上海', '华安', '工业', '(', '集团', ')', '公司', '董事长', '谭旭光', '和', '秘书', '张晚霞', '来到', '美国', '纽约', '现代', '艺术', '博物馆', '参观', '。']
['萨哈夫', '说', ',', '伊拉克', '将', '同', '联合国', '销毁', '伊拉克', '大', '规模', '杀伤性', '武器', '特别', '委员会', '继续', '保持', '合作', '。']
['HanLP', '支援', '臺灣', '正體', '、', '香港', '繁體', ',', '具有', '新詞', '辨識', '能力', '的', '中文', '斷詞', '系統']
['蜡烛', '两', '头', '烧']
####################################################################################################


加载POS模型...

missing keys:[]
unexpected keys:[]
error msgs:[]
['NN', 'VC', 'CD', 'M', 'NN', 'CC', 'NN', 'VV', 'DEC', 'NN', 'NN', 'VV', 'NN', 'PU', 'NN', 'VC', 'VV', 'NN', 'NN', 'VV', 'P', 'NN', 'NN', 'LC', 'DEG', 'NN', 'PU']
['NN', 'VV', 'NN', 'VV', 'PU', 'NN', 'JJ', 'PU', 'NN', 'VA', 'PU', 'NN', 'LC', 'JJ', 'NN', 'VV', 'VV', 'DEC', 'NN', 'PU']
['NN', 'NN', 'P', 'NN', 'CC', 'NN', 'NN', 'PU', 'NN', 'NN', 'PU', 'NN', 'NN', 'NN', 'NN', 'PU', 'AD', 'VV', 'PU']
['NR', 'NR', 'NN', 'PU', 'NN', 'PU', 'NN', 'NN', 'NR', 'CC', 'NN', 'NR', 'VV', 'NR', 'NR', 'JJ', 'NN', 'NN', 'VV', 'PU']
['NR', 'VV', 'PU', 'NR', 'AD', 'P', 'NR', 'VV', 'NR', 'JJ', 'NN', 'JJ', 'NN', 'JJ', 'NN', 'VV', 'VV', 'NN', 'PU']
['NR', 'VV', 'NR', 'NN', 'PU', 'NR', 'NN', 'NN', 'VV', 'NN', 'NN', 'NN', 'DEC', 'NN', 'NN', 'NN']
['NN', 'CD', 'M', 'VV']
####################################################################################################


加载NER模型...
missing keys:['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
unexpected keys:[]
error msgs:[]
[]
[]
[]
[('上海华安工业(集团)公司', 'NT', 0, 12), ('谭旭光', 'NR', 15, 18), ('张晚霞', 'NR', 21, 24), ('美国', 'NS', 26, 28), ('纽约现代艺术博物馆', 'NS', 28, 37)]
[('萨哈夫', 'NR', 0, 3), ('伊拉克', 'NS', 5, 8), ('联合国销毁伊拉克大规模杀伤性武器特别委员会', 'NT', 10, 31)]
[('hanlp', 'NT', 0, 2), ('**', 'NS', 4, 6), ('香港', 'NS', 9, 11)]
[]
####################################################################################################


加载DEP模型...
missing keys:[]
unexpected keys:[]
error msgs:[]
[(2, 'top'), (0, 'root'), (4, 'nummod'), (11, 'clf'), (7, 'conj'), (7, 'cc'), (8, 'nsubj'), (11, 'rcmod'), (8, 'cpm'), (11, 'nn'), (12, 'nsubj'), (2, 'ccomp'), (12, 'dobj'), (2, 'punct'), (16, 'top'), (2, 'conj'), (16, 'ccomp'), (19, 'nn'), (20, 'nsubj'), (17, 'conj'), (26, 'assmod'), (23, 'nn'), (24, 'lobj'), (21, 'plmod'), (21, 'assm'), (20, 'dobj'), (2, 'punct')]
[(2, 'nsubj'), (0, 'root'), (4, 'nsubj'), (19, 'rcmod'), (4, 'punct'), (7, 'nsubj'), (4, 'conj'), (4, 'punct'), (10, 'nsubj'), (4, 'conj'), (4, 'punct'), (13, 'lobj'), (17, 'loc'), (15, 'amod'), (16, 'nsubj'), (4, 'conj'), (4, 'conj'), (4, 'cpm'), (2, 'dobj'), (2, 'punct'), (2, 'conj')]
[(2, 'nn'), (18, 'nsubj'), (18, 'prep'), (6, 'conj'), (6, 'cc'), (7, 'nn'), (3, 'pobj'), (18, 'punct'), (10, 'nn'), (15, 'nn'), (15, 'punct'), (15, 'nn'), (15, 'nn'), (15, 'nn'), (18, 'nsubj'), (15, 'punct'), (18, 'advmod'), (0, 'root'), (18, 'punct'), (18, 'nsubj')]
[(7, 'nn'), (7, 'nn'), (7, 'nn'), (5, 'punct'), (7, 'nn'), (7, 'punct'), (8, 'nn'), (9, 'nn'), (12, 'conj'), (12, 'cc'), (12, 'nn'), (13, 'nsubj'), (0, 'root'), (18, 'nn'), (18, 'nn'), (18, 'amod'), (18, 'nn'), (13, 'dobj'), (13, 'conj'), (13, 'punct'), (13, 'conj')]
[(2, 'nsubj'), (0, 'root'), (2, 'punct'), (8, 'nsubj'), (8, 'advmod'), (8, 'prep'), (6, 'pobj'), (2, 'ccomp'), (15, 'nn'), (11, 'amod'), (13, 'nn'), (13, 'amod'), (15, 'nn'), (15, 'amod'), (8, 'dobj'), (17, 'mmod'), (8, 'ccomp'), (17, 'dobj'), (2, 'punct'), (2, 'ccomp')]
[(2, 'nsubj'), (0, 'root'), (4, 'nn'), (2, 'dobj'), (16, 'punct'), (7, 'nn'), (2, 'dobj'), (16, 'nn'), (16, 'rcmod'), (12, 'nn'), (12, 'nn'), (9, 'dobj'), (9, 'cpm'), (16, 'nn'), (16, 'nn'), (2, 'dobj'), (2, 'conj')]
[(4, 'nsubj'), (3, 'nummod'), (4, 'dep'), (0, 'root'), (4, 'conj')]
####################################################################################################


加载SDP模型...
missing keys:[]
unexpected keys:[]
error msgs:[]
[[(2, 'Exp'), (16, 'Exp')], [(0, 'ROOT')], [(4, 'Quan')], [(0, 'ROOT')], [(7, 'eCoo')], [(7, 'mConj')], [(8, 'Belg')], [(13, 'rPoss')], [(8, 'mAux')], [(11, 'Desc')], [(12, 'Pat')], [(13, 'Desc')], [(2, 'Clas')], [(2, 'mPunc')], [(16, 'Exp')], [(2, 'eCoo')], [(0, 'ROOT')], [(19, 'Desc')], [(20, 'Pat')], [(17, 'dCont')], [(23, 'mPrep')], [(23, 'Desc')], [(26, 'Sco')], [(23, 'mRang')], [(20, 'mAux'), (23, 'mAux')], [(16, 'Clas')], [(16, 'mPunc')]]
[[(2, 'Poss')], [(0, 'ROOT')], [(4, 'Exp')], [(0, 'ROOT')], [(4, 'mPunc')], [(7, 'Exp'), (10, 'Exp')], [(0, 'ROOT')], [(7, 'mPunc'), (10, 'mPunc')], [(10, 'Exp')], [(0, 'ROOT')], [(7, 'mPunc'), (10, 'mPunc')], [(17, 'dTime')], [(12, 'mTime')], [(0, 'ROOT')], [(0, 'ROOT')], [(17, 'Mann')], [(19, 'dDesc')], [(17, 'mAux')], [(2, 'Belg')], [(2, 'mPunc')]]
[[(2, 'Loc')], [(18, 'Agt')], [(0, 'ROOT')], [(0, 'ROOT')], [(0, 'ROOT')], [(7, 'Cont')], [(0, 'ROOT')], [(7, 'mPunc')], [(10, 'Desc')], [(0, 'ROOT')], [(15, 'mPunc')], [(13, 'Desc')], [(14, 'Pat')], [(15, 'Desc')], [(10, 'Nmod'), (18, 'Prod')], [(15, 'mPunc')], [(18, 'mTime')], [(0, 'ROOT')], [(18, 'mPunc')]]
[[(5, 'Loc'), (7, 'Loc')], [(5, 'Nmod'), (7, 'Nmod')], [(7, 'Desc')], [(5, 'Nmod')], [(0, 'ROOT')], [(5, 'mPunc')], [(8, 'Poss')], [(11, 'eCoo')], [(8, 'Nmod')], [(11, 'mConj')], [(13, 'Agt')], [(11, 'Nmod')], [(0, 'ROOT')], [(15, 'Nmod'), (18, 'Loc')], [(18, 'Nmod')], [(17, 'Desc')], [(18, 'Desc')], [(13, 'Lfin')], [(13, 'ePurp')], [(19, 'mPunc')]]
[[(2, 'Agt')], [(0, 'ROOT')], [(2, 'mPunc')], [(8, 'Agt')], [(8, 'mTime')], [(7, 'mPrep')], [(8, 'Datv')], [(15, 'rAgt')], [(15, 'Nmod')], [(11, 'Desc'), (13, 'Desc')], [(15, 'Desc')], [(13, 'Desc'), (15, 'Desc')], [(15, 'Desc')], [(15, 'Desc')], [(0, 'ROOT')], [(0, 'ROOT')], [(16, 'dCont')], [(17, 'Cont')], [(2, 'mPunc')]]
[[(2, 'Agt')], [(0, 'ROOT')], [(4, 'Nmod')], [(2, 'Datv')], [(4, 'mPunc')], [(7, 'Nmod')], [(0, 'ROOT')], [(0, 'ROOT')], [(16, 'rPoss')], [(12, 'Desc')], [(12, 'Desc')], [(9, 'Belg')], [(9, 'mAux')], [(15, 'Desc')], [(16, 'Desc')], [(0, 'ROOT')]]
[[(3, 'Poss'), (4, 'Pat')], [(3, 'Quan')], [(4, 'Loc')], [(0, 'ROOT')]]

About

基于pytorch的轻量级hanlp2.0工具,支持中文分词,词性分类,实体抽取,句法分析,语义分析


Languages

Language:Python 100.0%