This nTamil project aims to create a comprehensive and high-quality collection of Tamil text data for natural language processing (NLP) especially for LLMs and linguistic research.
Repository from Github https://github.comvelkadamban/Tamil-CorpusRepository from Github https://github.comvelkadamban/Tamil-Corpus