Myschow / Optimizing-Wav2Vec-2.0-for-Bengali-Speech-Recognition-A-Comprehensive-Study

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Optimizing-Wav2Vec-2.0-for-Bengali-Speech-Recognition-A-Comprehensive-Study

This research focuses on enhancing Bengali voice identification through augmentation of the Wav2Vec 2.0 framework using the Bengali Common Voices Dataset. The goal is to improve the transcription accuracy of uninterrupted speech, a longstanding challenge in speech recognition. Through techniques like dynamic padding, hyperparameter tuning, and convergence analysis, the Word Error Rate (WER) decreased significantly to 0.5, marking a 50% increase in accuracy. Evaluation metrics include accuracy, precision, and recall, with training and validation losses converging across epochs, demonstrating model stability and learning progress. This study signifies a significant advancement in Bengali speech recognition, providing a customized model tailored to the language's unique characteristics. The methodologies employed lay the groundwork for future improvements in voice recognition for underrepresented languages, promising broader advancements in the field.

To view the Paper on Overleaf

https://www.overleaf.com/read/kqqggbgwwsxh#82c5f8

About