johnolafenwa / Ling10

A dataset of 190 000 sentences categorized into 10 languages, primarily for Language Detection tasks. This repository containes the dataset and code for processing it.