ZhangXinyiCindy / DS2000-Project-1-Bonjour-and-Buenos-Dias

Course work of "DS2000 Programming with Data" applied a supervised machine learning method which create a tool to predict what language a document is written in given a sample set of known documents.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DS2000-Project-1-Bonjour-and-Buenos-Dias

Course work of "DS2000 Programming with Data" applied a supervised machine learning method which create a tool to predict what language a document is written in given a sample set of known documents.

Overview

The goal of this project is to create a tool to predict what language a document is written in given a sample set of known documents. The tool will be able to use various techniques to make guesses. The basis for each of the techniques will be looking at frequencies of trigrams. A trigram is a three character subsequence of the document. The program will be limited to a short list of languages and each language will have a small set of training data (documents for which the language is known).

About

Course work of "DS2000 Programming with Data" applied a supervised machine learning method which create a tool to predict what language a document is written in given a sample set of known documents.


Languages

Language:Python 100.0%