connorbode / Automatic-Language-Identification

Parses text and determines the language of the text based on bigrams.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Automatic-Language-Identification

Parses text and determines the language of the text based on character bigrams.

Project Description

This Java project was done for my Concordia Artificial Intelligence course with Dr. Leila Kosseim. The following are the requirements:

  • system must be able to read training corpora into bigrams
  • system must be able to use bigrams to identify the language of a given sentence

Specifics

The system..

  • ignores punctuation
  • considers all letters lower case
  • removes diacritics
  • is based on 2-character sequences

About

Parses text and determines the language of the text based on bigrams.


Languages

Language:Java 100.0%