ku-nlp / pyknp

A Python Module for JUMAN++/KNP

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Maximum byte size of input string

aneeshp1994 opened this issue · comments

What is the maximum byte size for input string for Morpheme class? I am getting the following error:

Traceback (most recent call last):
  File "generate_vectors.py", line 207, in <module>
    tokenize_text(JA_WIKI_TEXT_FILENAME, JA_WIKI_TEXT_TOKENS_FILENAME)
  File "generate_vectors.py", line 139, in tokenize_text
    tokenized_text = ' '.join(get_words(text, juman_pp=True))
  File "generate_vectors.py", line 114, in get_words
    result = jumanpp.analysis(text)
  File "/media/nicoindia/Ubuntu/miniconda3/envs/gfoot/lib/python3.7/site-packages/pyknp/juman/juman.py", line 91, in analysis
    return self.juman(input_str, juman_format)
  File "/media/nicoindia/Ubuntu/miniconda3/envs/gfoot/lib/python3.7/site-packages/pyknp/juman/juman.py", line 78, in juman
    result = MList(self.juman_lines(input_str), juman_format)
  File "/media/nicoindia/Ubuntu/miniconda3/envs/gfoot/lib/python3.7/site-packages/pyknp/juman/mlist.py", line 29, in __init__
    mrph = Morpheme(line, mid, juman_format)
  File "/media/nicoindia/Ubuntu/miniconda3/envs/gfoot/lib/python3.7/site-packages/pyknp/juman/morpheme.py", line 80, in __init__
    self._parse_spec(spec.strip("\n"))
  File "/media/nicoindia/Ubuntu/miniconda3/envs/gfoot/lib/python3.7/site-packages/pyknp/juman/morpheme.py", line 143, in _parse_spec
    self.hinsi_id = int(parts[4])
ValueError: invalid literal for int() with base 10: 'input'

I have found out that this error is caused because the input string length is greater than maximum length allowed. In morpheme.py, in _parse_spec, if I use print(spec) then I get the following string

'InvalidParameter byte size of input string (12797) is greater than maximum allowed (4096)'

Is there a way to change the maximum length allowed?