edit-distance reinforcement-learning asr levenshtein

Edit-distance as objective function

There are several research fields in which the edit-distance chosen as the objective function. For example, in Automatic Speech Recognition (ASR) the main metric of the quality of models is Word Error Rate (WER).

Problems

Unfortunately, directly optimize the edit-distance function is difficult. Therefore, in most cases, approaches based on a proxy function, like a cross-entropy. On the other hand, in the context of the sequence learning task this leads to several problems [1]:

Exposure Bias: the model is never exposed to its own errors during training, and so the inferred histories at test-time do not resemble the gold training histories.
Loss Evaluation Mismatch: training uses a word-level loss, while at test-time we target improving sequence-level evaluation metrics
Label Bias: since word probabilities at each time-step are locally normalized, guaranteeing that successors of incorrect histories receive the same mass as do the successors of the true history.

Solutions

The following table summarizes the works that attempts to solve the mentioned problems. There are much more detailed overview of works, for example [2], but this list includes only works that use the edit-distance explicitly or implicitly. Moreover, most of these works formalize the sequence prediction task as an action-taking problem in Reinforcement Learning.

Year	Task	Reward level	Algorithms, Models	Affiliation	Authors, Link
2020	ASR	Sentence	MWER, RNN-T	Amazon	Guo et al.
2020	MT	Sentence	MGS, parameter search	NYU	Welleck, Cho
2020	ASR	Sentence	Proper Noun, Phonetic Fuzzing, MWER, RNN-T, LAS	Google	Peyser, Sainath, Pundak
2019	NLP	Sentence	GPT-2, PPO, Human labeling	OpenAI	Ziegler, Stiennon et al.
2019	ASR	Sentence	Neural Architecture Search, REINFORCE, CTC	KPMG Nigeria, OAU	Baruwa et al.
2019	ASR	Sentence	Normalized MWER	Amazon	Gandhe, Rastrow
2019	ASR	Token	MBR, RNN-T	Tencent, USA	Weng et al.
2019	ASR	Token	ECTC-DOCD	China	Yi, Wang, Xu
2019	ASR	Sentence	MWER, RNN-T, LAS	Google	Sainath, Pang et al
2019	MT	Token	Reinforce-NAT, Non-Autoregressive Transformer	China, Tencent	Shao, Feng et al.
2019	MT, TS, APE	Token	Levenshtein Transformer, imitation learning	Facebook, New York	Gu, Wang, Zhao
2018	ASR	Token	MBR, softmax margin, PAPB, S2S	Brno, JHU, MERL	Baskar et al.
2018	ASR	Token	OCD, S2S	Google Brain	Sabour, Chan, Norouzi
2018	ASR	Token	REINFORCE, S2S	Nara, RIKEN	Tjandra et al.
2018	TS	Sentence	Alternating Actor-Critic	Hong Kong, Tencent	Li, Bing, Lam
2018	ASR	Sentence	REINFORCE, PPO, Reward shaping	Tokyo	Peng, Shibata, Shinozaki
2017	ASR	Sentence	REINFORCE, Self-critic	Salesforce	Zhou, Xiong, Socher
2017	ASR	Sentence	MWER, LAS, Sampling, N-best	Google	Prabhavalkar et al.
2017	ASR	Sentence	Expected Loss, RNA	Google	Sak et al.
2017	MT	Sentence	Actor-Critic, Critic-aware	Hong Kong, New York	Gu, Cho, Li
2016	ASR	Sentence	Reward Augmented ML	Google Brain	Norouzi et al.
2016	MT	Token	Actor-Critic	Montreal, McGill	Bahdanau et al.
2015	MT	Sentence	MIXER	Facebook	Ranzato et al.
2015	ASR	Token	Task Loss Estimation	Montreal, Wrocław	Bahdanau et al.
2014	ASR	Sentence	Expected Loss, CTC	DeepMind, Toronto	Graves, Jaitly

Reference

About

A curated list of papers dedicated to edit-distance as objective function

edit-distance reinforcement-learning asr levenshtein