prateekagarwal3/ProjectX

unsupervised-machine-learning plsa expectation-maximization

(group1)Consonants Image Size : 40 X 40 (~average size of all the original images)
(group2)Vowels Image Size : 40 X 40 (~average size of all original images)

TermDoc folder contains normalized dataset and code for TermDoc generation.

1.1 contains the current implementation which seems right and few graphs.I ran it for F004 - F064 & Generated Strokes are in "Stroke Folder".
1.2 contains the final state of strokes and codes before the Mid-Sem Project evaluation.
1.3 contains working code with the threading done.Steps for making this run are :
  1)Generate termDoc and provide it to EMnTimes and then run RemoveNoise to clean the strokes.
1.3 contains tibetan output.

Project Report contains the pdf and latex files which were used to create the report.

Strokes and Plot contains strokes genrated on the 2517 data set with strokes as 10 for 40 iterations and plots of the same

Original Data Set contains the Data Set given to us by Sir. Please dont modify that directory . 

English contains original and thinned versions of Arial Size 72 English characters, along with the code for thinning.

About

Language Invariant Optical Character Recognition

unsupervised-machine-learning plsa expectation-maximization

Languages

Language:Python 67.5%Language:TeX 32.5%