mesolitica / malaya

Natural Language Toolkit for Malaysian language, https://malaya.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Dealing with '@' in dependency parsing module

Kensvin28 opened this issue · comments

I am parsing the dependency for a sentence with an @ symbol, but it doesn't come out in the parse tree result. Is this expected? Because the other punctuations are detected by the parser.

import malaya
dep_model = malaya.dependency.transformer(model='albert')
d_object, tagging, indexing  = dep_model.predict("Baru guna 1 @ 2 kali.")
d_object.to_graphvis()

image

It is because of this, https://github.com/huseinzol05/malaya/blob/master/malaya/text/function.py#L236
Herm, the model never seen @ during training session.

Can you try to use malaya.dependency.huggingface,

model = malaya.dependency.huggingface()
model.predict('Baru guna 1 @ 2 kali .')

Output,

[('Baru', 'amod'),
  ('guna', 'root'),
  ('1', 'nummod'),
  ('@', 'punct'),
  ('2', 'nummod'),
  ('kali', 'nmod'),
  ('.', 'punct')],
 [('Baru', 2),
  ('guna', 0),
  ('1', 6),
  ('@', 6),
  ('2', 6),
  ('kali', 2),
  ('.', 2)])

Again, huggingface model do not performed any preprocessing, the reason why because we want to give the freedom to users which tokenizer technique they want to do.