golecalicja / language-recognition-neural-network

A single-layer neural network written from scratch that predicts the language of the text.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Language recognition neural network

Table of contents

Introduction

A single-layer neural network written from scratch that predicts the language of the text based on the proportions of letters.

Data description

Articles scraped from Wikipedia. The neural network was trained on 5 languages (English, German, Polish, Spanish and French), 10 articles each. However, the code is generic and can be applied to any number of languages.

Examples of performance

English sample:

image

German sample:

image

Polish sample:

image

Methods used

  • Dynamic reading of multiple data files from a various number of directories
  • Calculating vectors of letter proportions in each language (ASCII characters only)
  • Implementing a single-layer neural network of k perceptrons (where k = number of languages) from scratch
  • Libraries used for data loading and preprocessing: pandas, numpy
  • OOP and clean code
  • Unit testing with pytest