machine-learning character-recognition image-classification chinese-character-recognition tensorflow keras

Chinese-Character-Recognition

A repo for machine-learning powered image classification of handwritten Chinese characters

Website

Direct link

Visitor map for website

Full release date: July 4, 2021

November 10, 2021 Update: Enabled writing the characters on mobile devices.

Novermber 30, 2021 Update: Implemented visitor map for website.

Convert Keras Model To Tensorflow.js for Web App

!pip install tensorflowjs

!tensorflowjs_converter --input_format keras "model.h5" "docs/model"

Packages Used

TensorFlow, Numpy, Pandas, Matplotlib, CV2, OS

Data

Found here

For this data 100 people wrote each of the 15 characters 10 times

`suite_id`

suite_id is for each volunteer (100 total)

`sample_id`

sample_id is for each sample of each volunteer (10 total) i.e. each volunteer writes each character 10 times

`code`

code is used to identify each character in their sequence i.e. code is the ith character in order

i.e. 零 is 1, 一 is 2, 二 is 3, ... 九 is 10

`value`

value is the numerical value of the character i.e. 5

`character`

character is the actual symbol i.e. 五

Model Performance

Optimizer: ADAM

Loss function: Sparse Categorical Cross Entropy

Training Data (80%)

Epochs: 15

Loss: 0.0844

Accuracy: 0.9890 (98.90%)

Testing Data (20%)

Loss: 0.8035

Accuracy: 0.7753 (77.53%)

A Closer Look

Note: The outputs of these models are given in the range from 0 to 14. In order to convert to characters, you can use this convenient list.

characters = ('零', '一', '二', '三', '四', '五', '六', '七', '八', '九', '十', '百', '千', '万', '亿')

50th Test Image

Prediction data:

In [1]: PredictionData[50]

Out [1]: array([3.6610535e-07, 1.8792988e-28, 2.2422669e-12, 1.7063133e-10,
       2.6500999e-09, 5.2112355e-06, 3.2691182e-07, 8.3664847e-05,
       3.6446830e-13, 9.9990106e-01, 1.1595256e-14, 5.7882625e-07,
       9.6461701e-09, 3.3319463e-07, 8.4753528e-06], dtype=float32)

500th Test Image

Prediction data:

In [2]: PredictionData[500]

Out [2]: array([3.83269071e-04, 4.26350415e-21, 5.74980230e-10, 5.73274974e-07,
      4.58842493e-04, 8.47115181e-03, 6.79080131e-07, 1.44536525e-05,
      8.73805250e-10, 1.13052677e-03, 2.78494355e-07, 9.84949350e-01,
      2.79740023e-04, 4.30813758e-03, 2.93952530e-06], dtype=float32)

To Use

Option A

1) Load the models into your file

In your Python code, enter the following code to import the models.

import tensorflow as tf

model = tf.keras.models.load_model("model", compile=True)

The outputs of these models are given in the range from 0 to 14. In order to convert to characters, you can use this convenient list.

characters = ('零', '一', '二', '三', '四', '五', '六', '七', '八', '九', '十', '百', '千', '万', '亿')

You can then use this model in for other uses! Enjoy!

Option B

1) Download `archive.zip` and `ChinCharRecog.py`

2) Unpack `archive.zip` and ensure that its contents are in the same directory as `ChinCharRecog.py`

This should include the data folder, the chinese_mnist.csv csv file, and chinese_mnist.tfrecords. For future reference let's call this directory folder1.

3) Inside of `ChinCharRecog.py` change the variable `directory` on line 81

Change directory to the /folder1/data/data/ directory. On macOS, this might like something like /Users/user_name_here/Desktop/folder1/data/data/.

4) Run the file `ChinCharRecog.py` and enjoy!

About

A repo for machine-learning powered image classification of handwritten Chinese characters

https://tyler-pruitt.github.io/Chinese-Character-Recognition/

machine-learning character-recognition image-classification chinese-character-recognition tensorflow keras

MIT License

Languages

Language:Python 100.0%