Jap_Speak_Recog

Project of the Machine Learning course at RUG
Dataset: https://archive.ics.uci.edu/ml/datasets/Japanese+Vowels
Google Doc documenting work flow (not cleaned): https://docs.google.com/document/d/1O5-qIBvy6kEe87fou5AGQ7VA5wKAOFOWSFszcSkFB5I/edit?usp=sharing

This project used: Python 3.8.5, Keras 2.4.3, scipy 1.3.1, scikit-learn 0.24.1

Abstract

In this work, we present the results of our Machine Learning project for the Japanese Vowels dataset. For this project, we used a real-life dataset containing spectral recordings of vocal utterances of the Japanese vowels /ae/, recorded from nine male speakers. The task is to match each multidimensional time series with the correct speaker. We compared various preprocessing methods in conjunction with state-of-the-art classifiers and found that resampling the recordings to an equal length using cubic spline interpolation improves classification performance significantly over all classification models. The best performance was obtained by an ensemble of 11 separately trained Long Short Term Memory architectures in combination with cubic spline interpolation and subsequent resampling to an equal length of 26 time steps for each recording, yielding a training accuracy of 99.82% and a testing accuracy of 98.86%.

About

Student Project of the Machine Learning course by Herbert Jaeger at Groningen University. We identify speakers on the Japanese Vowels dataset using bidirectional LSTMs in combination with cubic polynomial resampling.

Languages

Language:Jupyter Notebook 98.5%Language:Python 1.3%Language:MATLAB 0.2%