There are 0 repository under wili-2018-dataset topic.
Language Identification classification using XGBoost
:globe_with_meridians: Language identification for Scandinavian languages
:mortar_board: 4th year Advanced Object Oriented Programming project. A web-based service capable of identifying the language classification of a submitted body of text. The OutOfPlaceMetric is used to compare the distance i.e. the similarity, of the text and the actual language of the text. A database is built from the subject file and is split into k-mers, which are ranked based on their frequency.
:mortar_board: 4th year Artificial Intelligence project. Using the Encog library, it uses vector hashing in conjunction with K-Fold Cross Validation to train a neural network using the WiLI Language Dataset. This neural network can then be used to predict the language of an input.