Setting properties to the tokenizer

Question

Setting properties to the tokenizer

CatarinaPC opened this issue 5 years ago · comments

Catarina Conceição commented 5 years ago

Hello

I was using this wrapper to perform tokenization over French sentences.

I also set properties according to the CoreNLP page, the Stanford Tokenizer page and the README here on this repository. However, the properties set in the 'tokenize.options' are having no effect. Is this the way to set properties to the tokenizer?

The code:

nlp = StanfordCoreNLP(r'../libraries/stanford-corenlp-full-2018-10-05', lang='fr')`

props = {'annotators': 'tokenize', 
         'pipelineLanguage': 'fr', 
         'outputFormat': 'text', 
         'tokenize.options': 
           'strictTreebank3=false, '
           'untokenizable=allkeep, '
           'escapeForwardSlashAsterisk=false, '
           'normalizeFractions=false, '
           'normalizeAmpersandEntity=false, '
           'invertible=true, '
           'asciiQuotes = false, '
           'latexQuotes=false, '
           'unicodeQuotes=false,  '
           'normalizeOtherBrackets=false, '
           'ptb3Dashes=false, '
           'americanize=false, '
           'normalizeAmpersandEntity=false, '
           'normalizeFractions=false, '
           'normalizeParentheses=false, '
           'normalizeOtherBrackets=false,'
           'ptb3Ellipsis=false, '
           'unicodeEllipsis=false'}

example_sentence = "Maria grandit au sein d'une famille de l'ancienne «bourgeoisie»."

nlp.annotate(example_sentence, properties=props)