Dataset will be hosted on 🤗 Datasets here
Dataset | Processing | Type | Language | Owner | Citation |
---|---|---|---|---|---|
AIBharat IndicCorp | In Process | Original Scraped | en-in, hi, as, bn, gu, kn, ml, mr, or, pa, ta, te | HC | citation |
CC-100 Corpus | In Process | Original, Romantized | as, bn, bn_rom, gu, hi, hi_rom, kn, ml, mr, ne, or, pa, sa, si, sd, ta, ta_rom, te, te_rom, ur, ur_rom | HC | citation |
WMT NEWS Crawl | Available to pickup | Original Scraped | bn, gu, hi, kn, ml, mr, or, pa, ta, te | citation | |
Charles University Hindi Monolingual Corpus | Available to pickup | Parallel Corpora | hi, en | ||
IIT Bombay Hindi Monolingual Corpus | Available to pickup | Parallel Corpora, Monolingual | hi, en | citation |