nature tagging with tokenized words
yipcma opened this issue · comments
Hi @qinwf ,
First off, thanks a lot for the wonderful package :)
I'd love to know if there's a way to nature tag already tokenized words (say, in a vector).
Currently when I run the tagger, it will breakdown my already tokenized vector of words. My use-case is to tag nature for words in my user-dictionary so they have the right nature instead of the ones I gave in dictionary creation.
Thanks again and look forward to your insights.
Cheers,
Andrew
Hi, Andrew.
I just added a vector_tag
function, you can install the package from GitHub to use this new function.
> cc = worker()
> vector_tag(c("这","是","北京"),cc)
r v ns
"这" "是" "北京"
For now, the tagging process in this package is very simple, and it just read the dictionary and find the one and only tag for each word in the dictionary. So the tags are not very accurate.
There is a THULACR package, which is not yet published to public repo. If you want to use that, I can add you to the private repo. Although THULACR package will not be able to do tagging on vector of words, and it will be able to tag a sentence.
@qinwf This is very exciting. I'd love to try THULACR out. Please add me to the repo :) And thanks again for the wonderful work on NLP support in R.