AI4Bharat / Indic-BERT-v1

Indic-BERT-v1: BERT-based Multilingual Model for 11 Indic Languages and Indian-English. For latest Indic-BERT v2, check: https://github.com/AI4Bharat/IndicBERT

Home Page:https://indicnlp.ai4bharat.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Documentation to implement NER

koushikram3420 opened this issue · comments

Hey,
I tried using IndicBert NER for news article clustering using transformers. While tokenization, some of the tokens are getting split up. I wanted to know if there is any way to avoid it.
Also, when I implemented the same example as you have mentioned in your documentation, I get different results.
brisbane (2)
chanakya (2)
kindly help me on why the tokens are not getting recognized properly. When I tried giving custom inputs in the same format of the tokenizer, tokens are not getting recognized and giving encoding as 1 even with add_special_token.
ss4
It would be helpful if you could share some implementations of the NER.

Can you please share your notebook?
Thanks in advance.

Anybody able to create an example of NER for indian language using indic bert. That would be very helpful . @koushikram3420 which model you have usen because I think if you have use indic bert then according to your process its label size should be 768 whereas in yours case label size is 9
Screenshot from 2022-02-03 22-31-55