论文阅读:TRAINING ASR MODELS BY GENERATION OF CONTEXTUAL INFORMATION

(icassp2020)论文阅读:TRAINING ASR MODELS BY GENERATION OF CONTEXTUAL INFORMATION

下载链接:https://arxiv.org/abs/1910.12367

主要思想:

       利用海量的弱监督数据和部分常规的标注数据进行e2e模型训练。【这里的弱监督数据主要指的是仅含有上下文相关文本的音频数据(English social media videos along with their respective titles and post text)】。

模型结构:

文章模型结构无创新,采用的还是encoder-decoder的形式,主要还是multi-head attention的结构。

论文阅读:TRAINING ASR MODELS BY GENERATION OF CONTEXTUAL INFORMATION

模型训练:

本篇文章的模型训练方式与一般的模型训练不同,采用了三部训练的方法。

(1) 【初始化训练:首先是一个初始化监督训练的阶段,这个阶段只要是让解码器学习正确地传递梯度信息,以调整编码器的声学特征】An initial supervised burn-in phase in which the decoder cross-attention learns to properly
communicate gradient information to adjust encoder acoustic features.

(2)【主要训练阶段:这个阶段混合常规学习和弱监督学习的损失函数。】A training phase driven by a mixture of the supervised and the weakly supervised loss functions, we refer to it as the train-main phase, in which the model expands its inventory of audio features and mappings between acoustic and linguistic cues.

(3)【微调阶段】A final supervised-only fine-tune phase which utilizes either the full encoder-decoder model trained in the train-main step, or the encoder component to be refined by the CTC loss.

 

 

 

 

上一篇:【无标题】


下一篇:IC验证“UVM验证平台加入factory机制“(六)