ashim95 / IndicLM

Experiments on language modeling for indic languages

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

IndicLM

Experiments on language modeling for indic languages

  1. generate_sentences_Yoav_Goldberg.ipynb : Use this file to generate nonce sentences for Yoav Goldberg like experiments.

  2. The sentence pairs for Yoav Goldberg like experiments in Malayalam can be found at this link. The counts for each of the dependency types are as follows:

{
'k7': 29304,
'k4': 19469,
'vmod': 61257,
'r6': 29717,
'k5': 3724,
'nmod': 60813,
'k1': 21190,
'k2': 16209,
'k3': 715
}

The format of the file (tab separated) is as follows:

Column 0 : Dependecy type to which the example belongs
Column 1 : Sentence ID of the original (correct) sentence
Column 2 : Index of the word in the sentence that is being replaced (to generate the incorrect example)
Column 3 : Correct Sentence
Column 4 : Incorrect Sentence with one word replaced

About

Experiments on language modeling for indic languages

License:GNU General Public License v3.0


Languages

Language:Jupyter Notebook 97.2%Language:Python 2.8%