Dealing with '@' in dependency parsing module
Kensvin28 opened this issue · comments
I am parsing the dependency for a sentence with an @ symbol, but it doesn't come out in the parse tree result. Is this expected? Because the other punctuations are detected by the parser.
import malaya
dep_model = malaya.dependency.transformer(model='albert')
d_object, tagging, indexing = dep_model.predict("Baru guna 1 @ 2 kali.")
d_object.to_graphvis()
It is because of this, https://github.com/huseinzol05/malaya/blob/master/malaya/text/function.py#L236
Herm, the model never seen @
during training session.
Can you try to use malaya.dependency.huggingface
,
model = malaya.dependency.huggingface()
model.predict('Baru guna 1 @ 2 kali .')
Output,
[('Baru', 'amod'),
('guna', 'root'),
('1', 'nummod'),
('@', 'punct'),
('2', 'nummod'),
('kali', 'nmod'),
('.', 'punct')],
[('Baru', 2),
('guna', 0),
('1', 6),
('@', 6),
('2', 6),
('kali', 2),
('.', 2)])
Again, huggingface model do not performed any preprocessing, the reason why because we want to give the freedom to users which tokenizer technique they want to do.