Tass0sm / espnet

End-to-End Speech Processing Toolkit

Home Page:https://espnet.github.io/espnet/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Our fork of espnet for CSE 5539 experiments.

We explored using ideas from image processing on the spectogram.

  1. Using axial attention blocks
  2. Using SWIN transformer blocks

ASR

Baseline

espnet_model with transformer encoder / decoder

Variant 1

Processing 2d frames of the spectogram without prior convolutions. axial attention on frames.

Result: less effective

Variant 2

About

End-to-End Speech Processing Toolkit

https://espnet.github.io/espnet/

License:Apache License 2.0


Languages

Language:Python 55.4%Language:Shell 42.3%Language:Perl 1.5%Language:MATLAB 0.5%Language:CMake 0.1%Language:M 0.1%Language:Makefile 0.1%Language:Dockerfile 0.1%Language:Cython 0.0%