本文笔记旨在概括地讲deep learning的经典应用。内容太大,分三块。



1. 回想 deep learning在图像上的经典应用

1.1 Autoencoder

1.2 MLP

1.3 CNN<具体的见上一篇CNN>

2. deep learning处理语音等时序信号

2.1 对什么时序信号解决什么问题

2.2 准备知识

2.2.1 Hidden Markov Model(HMM)

2.2.2 GMM-HMM for Speech Recognition

2.2.3 Restricted Boltzmann Machine(RBM)

3.  DBN 和 RNN 在语音上的应用

3.1 DBN

3.1.1 DBN架构

3.1.2 DBN-DNN for Speech Recognition

3.2 RNN

3.2.1 RNN种类

3.2.2 RNN-RBM for Sequential signal Prediction


1. 回想 deep learning处理图像等非时序信号 <具体的见上一篇CNN>


1.1 AutoEncoder(unsupervised)

扩展:Stack AutoEncoder(能够变成supervised),见Andrew Ng的UFLDL教程。我就不贴图了


1.2 MLP

MLP(ANN)是最naive的神网分类器。一个hidden层,连两端nonlinear function,output输出为f(x),softmax做分类。

Deep learning From Image to Sequence


1.3 Convolutional Neural Network

特点:1. 非全连接,2、共享权重

做法:1. 卷积 2. 降採样(pooling)


Deep learning From Image to Sequence


2. deep learning处理语音等时序信号

2.1 对什么时序信号解决什么问题:

handwriting recognition
speech recognition
music composition
protein analysis
stock market prediction

2.2 准备知识:


2.2.1 Hidden Markov Model(HMM) - 带unobserved(这就是所谓hidden)states的随机过程。表示输入语音信号和hidden state(因素)的模型:

Deep learning From Image to Sequence

训练HMM模型:给定一个时序y1...yT, 用MLE(typically EM implemented,具体见这篇第三部分training) 预计參数;


2.2.2 GMM-HMM for Speech Recognition (较大。单独放在一篇blog里了)


2.2.3 Restricted Boltzmann Machine

讲RBM之前要先讲一下生成模型……<How to build a single layer of feature detector>

大体分为两类——directed model & undirected model:

1.directed model (e.g. GMM 从离散分布求latent状态)

依据先验分布选择latent variable的状态

给定latent states,依据条件分布求observable variables的状态

2.undirected model

仅仅用參数W,通过能量函数定义v(visible)和h(hidden latent variables)的联合概率

Deep learning From Image to Sequence
             依据”explaining away”,假设latent和visible变量有着非线性关系。directed model非常难判断出latent variable的状态;但在undirected model中,仅仅要latent变量间没有变项链就能够轻松判断。

PS: explaining away是什么?



RBM 是马尔科夫随机场(MRF)的一种。不同之处:

1. RBM是一个双向连接图(bipartite connectivity graph)

2. RBM在不同unit之间不共享权重

3. 有一部分变量是unobserved


Deep learning From Image to Sequence

Deep learning From Image to Sequence

RBM的參数构成:W(weight), bias_h, bias_v

已知联合分布P(v,h) 。 可通过Gibbs採样边缘分布分别得到h,v,依据Gradient of NLL进行梯度下降学习到參数。




contrastive divergence 採样k次(gibbs CD-k)

依据cost function进行update : Deep learning From Image to Sequence, 即 cost = T.mean(self.free_energy(self.input)) - T.mean(self.free_energy(chain_end))

上面讲的RBM都是v,h = 0/1的。那怎么处理real-value的呢?

ANS:用Gaussian-Bernoulli RBM (GRBM)。

对上面经典RBM修改不大。仅仅须要改energy function & conditional prob:

Deep learning From Image to Sequence

Deep learning From Image to Sequence

3.  DBN 和 RNN 在语音上的应用

3.1 DBN

3.1.1 DBN架构

Deep learning From Image to Sequence

1. pre-train





2. 能够直接把这几层pre-train好的W叠起来,双向weight箭头全改成top-down的。成了一个DBN生成模型

3. 加分类器


3.1.2 DBN-DNN for Speech Recognition

假设你细致看过上一篇GMM-HMM for Speech Recognition就会发现,这个模型和GMM-HMM仅仅差在GMM

即。DNN-HMM用DNN(undirected model)取代了GMM(directed model),这种优点是能够解决h,v之间非线性关系映射。

Deep learning From Image to Sequence


Deep learning From Image to Sequence


3.2 RNN

3.2.1 RNN种类


1.Fully Recurrent Network

2.Hopfield Network

3.Elman Network (Simple Recurrent networks)

4.Long short term memory network

Deep learning From Image to Sequence

fig. LSTM

3.2.2 RNN-RBM for Sequential signal Prediction

见一个RNN样例,RNNRBM(RNN-RBM for music composition 网络架构及程序解读




Deep Learning 在语音上的应用DNN经典文章:

1. Hinton, Li Deng, Dong Yu大作:Deep Neural Networks for Acoustic Modeling in Speech Recognition

2. Andrew Ng, NIPS 09, Unsupervised feature learning for audio classification using convolutional deep belief networks

Deep Learning 在语音上的应用RNN经典文章:

1. Bengio ICML 2012. RNN+RBM paper有实现 (下一篇细讲)

2. Schmidhuber JMLR 2002 paper讲LSTM经典

3. The Use of Recurrent Neural Networks in Continuous Speech Recognition,



