Lynten / stanford-corenlp

Python wrapper for Stanford CoreNLP.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Setting properties to the tokenizer

CatarinaPC opened this issue · comments

Hello

I was using this wrapper to perform tokenization over French sentences.

I also set properties according to the CoreNLP page, the Stanford Tokenizer page and the README here on this repository. However, the properties set in the 'tokenize.options' are having no effect. Is this the way to set properties to the tokenizer?

The code:

nlp = StanfordCoreNLP(r'../libraries/stanford-corenlp-full-2018-10-05', lang='fr')`

props = {'annotators': 'tokenize', 
         'pipelineLanguage': 'fr', 
         'outputFormat': 'text', 
         'tokenize.options': 
           'strictTreebank3=false, '
           'untokenizable=allkeep, '
           'escapeForwardSlashAsterisk=false, '
           'normalizeFractions=false, '
           'normalizeAmpersandEntity=false, '
           'invertible=true, '
           'asciiQuotes = false, '
           'latexQuotes=false, '
           'unicodeQuotes=false,  '
           'normalizeOtherBrackets=false, '
           'ptb3Dashes=false, '
           'americanize=false, '
           'normalizeAmpersandEntity=false, '
           'normalizeFractions=false, '
           'normalizeParentheses=false, '
           'normalizeOtherBrackets=false,'
           'ptb3Ellipsis=false, '
           'unicodeEllipsis=false'}

example_sentence = "Maria grandit au sein d'une famille de l'ancienne «bourgeoisie»."

nlp.annotate(example_sentence, properties=props)