FCGCL: Fine- and Coarse-Granularity Contrastive Learning for Speech Translation

This is the pytorch implementation for paper "FCCL: Fine- and Coarse-Granularity Contrastive Learning for Speech Translation".

Enviroment Configuration

Our code is based on Espnet and use PyTorch-Lightning to organize our code. Please install Espnet and PyTorch-Lightning following the official guidance.

Data Preparation

Download the wav2vec 2.0 model published in Huggingface.
We extract feature bases on wav2vec 2.0 before training. The scripts are saved on ./scripts/.
Save to json file. This is consistent with Espnet. We upload the dev.json and the corresponding feature for reference to quickly debug the code.

Model Training

. ./run.sh

The training process in defined on ./src/bins/plModule.py. The contrastive module is defined on ./src/bins/cl_loss.py.

About

Languages

Language:Roff 43.5%Language:C++ 27.8%Language:Python 8.9%Language:Perl 6.6%Language:C 6.5%Language:Shell 2.5%Language:HTML 1.3%Language:Makefile 0.9%Language:Smalltalk 0.6%Language:JavaScript 0.5%Language:PHP 0.4%Language:M4 0.1%Language:CSS 0.1%Language:Yacc 0.1%Language:Batchfile 0.1%Language:Cython 0.1%Language:Emacs Lisp 0.1%Language:CMake 0.0%Language:Assembly 0.0%Language:Java 0.0%Language:Logos 0.0%Language:Ruby 0.0%Language:NewLisp 0.0%Language:Raku 0.0%Language:SystemVerilog 0.0%Language:OCaml 0.0%Language:ActionScript 0.0%Language:Less 0.0%Language:nesC 0.0%Language:Starlark 0.0%Language:Slash 0.0%Language:E 0.0%Language:Forth 0.0%