In STT-engine project, a speech to text model is build for Amharic language. Feature extraction is implemented by generating Mel spectrogram images. CNN is used to learn the feature maps from spectrogram images and BI-RNN is applied to predict the transcription given a time series feature map.