Training a deep autoencoder or a classifier on MNIST digits_Rbm训练(Matlab)
这是第一次阅读matlab版的RBM程序所做的笔记,其中有好多没有理解的地方,希望能跟各位博友一起学习、一起研究、一起讨论,共同进步。
一、Rbm阅读材料
http://en.wikipedia.org/wiki/Restricted_Boltzmann_machine
http://deeplearning.net/tutorial/rbm.html
二、Rbm训练的基本原理
三、Rbm代码分析
% Version 1.000 % % Code provided by Geoff Hinton and Ruslan Salakhutdinov % % Permission is granted for anyone to copy, use, modify, or distribute this % program and accompanying programs and documents for any purpose, provided % this copyright notice is retained and prominently displayed, along with % a note saying that the original programs are available from our % web page. % The programs and documents are distributed without any warranty, express or % implied. As the programs were written for research purposes only, they have % not been tested to the degree that would be advisable in any important % application. All use of these programs is entirely at the user‘s own risk. % This program trains Restricted Boltzmann Machine in which % 训练RBM,可视层是二值的,随机的;隐藏层也一样;它们之间的连接是对称连接. % visible, binary, stochastic pixels are connected to % hidden, binary, stochastic feature detectors using symmetrically % weighted connections. Learning is done with 1-step Contrastive Divergence. % 学习只采用一次的CD. % The program assumes that the following variables are set externally: %下面的变量是外部设置的. % maxepoch -- maximum number of epochs %@这个变量有待后面分析 % numhid -- number of hidden units %隐藏单元的数量 % batchdata -- the data that is divided into batches (numcases numdims numbatches) %训练集被分成块:样本个数*样本的特征维数 % restart -- set to 1 if learning starts from beginning %有待理解@(如果学习从头开始,把这个变量设置为1?) epsilonw = 0.1; % Learning rate for weights %控制权值的学习率 epsilonvb = 0.1; % Learning rate for biases of visible units %控制可视单元的偏置的学习率 epsilonhb = 0.1; % Learning rate for biases of hidden units %控制隐藏单元偏置的学习率 weightcost = 0.0002; %@权值代价(有待理解) initialmomentum = 0.5;%@能量初始值 finalmomentum = 0.9;%@最终能量值 [numcases numdims numbatches]=size(batchdata); %@有左边输出变量有三个,这说明batchdata是三维的,第三维块的个数 if restart ==1, restart=0; epoch=1; % Initializing symmetric weights and biases. %初始化对称权值和偏置 vishid = 0.1*randn(numdims, numhid); %编程时,一定先给所采用的变量设定初始的矩阵来存贮 %可视层与隐藏层之间的权值矩阵:行为输入的维数numdims,列为隐藏单元的总数 hidbiases = zeros(1,numhid);%隐藏层的偏置,维数等于隐藏单元的总数 visbiases = zeros(1,numdims);%可视层的偏置,维数等于可视单元的总数 poshidprobs = zeros(numcases,numhid); %@pos、probs、numcases代表的含义有待求解 %@猜测一下,poshidprobs是用来存放正样本训练集(numcases)通过各个隐藏单元的输出值 neghidprobs = zeros(numcases,numhid); %@猜测一下,neghidprobs是用来存放负样本训练集(numcases)通过各个隐藏单元的输出值(概率) posprods = zeros(numdims,numhid); %@猜测一下,posprobs是用来存放正样本最终训练出来的权值矩阵numdims,numhid negprods = zeros(numdims,numhid); %@猜测一下,posprobs是用来存放负样本最终训练出来的权值矩阵numdims,numhid vishidinc = zeros(numdims,numhid); %@“inc"有待理解,vishidinc是用来存放权值矩阵的中间值? hidbiasinc = zeros(1,numhid); %@“inc"有待理解,hidbiasinc是用来存放隐藏层的偏置? visbiasinc = zeros(1,numdims); %@“inc"有待理解,visbiasinc是用来存放可视层的偏置? batchposhidprobs=zeros(numcases,numhid,numbatches); %@batchposhidprobs有待理解 end for epoch = epoch:maxepoch, fprintf(1,‘epoch %d\r‘,epoch); errsum=0; for batch = 1:numbatches, fprintf(1,‘epoch %d batch %d\r‘,epoch,batch); %%%%%%%%% START POSITIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %@这个相当对隐藏层采样,然后求解<vh>0,因为<vh>0是就正的,所以取STRAT POSITIVE PHASE %对于自编码器来说,这应该是编码阶段 data = batchdata(:,:,batch); poshidprobs = 1./(1 + exp(-data*vishid - repmat(hidbiases,numcases,1))); batchposhidprobs(:,:,batch)=poshidprobs; %batchposhidprobs存放着每个样对每个隐藏层单元状态为1的概率输出值,每块有100*1000个数 %(对第一层来说) posprods = data‘ * poshidprobs; poshidact = sum(poshidprobs);%把100个样本得出的隐藏层单元输出值加起来 posvisact = sum(data);%把块100个样本数据的各个特征加起来 %%%%%%%%% END OF POSITIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% poshidstates = poshidprobs > rand(numcases,numhid); %判断隐藏层单元输出值是否大于一个随机矩阵中对应元素的值。如果大于随机矩阵中对应元素的值,将值改为1. %即是把隐藏层单元输出值转化为0,1二值状态 %%%%%%%%% START NEGATIVE PHASE %%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%% %@这个相当对隐藏层采样,然后求解<vh>1,因为<vh>1是就负的,所以取STRAT NEGTIVE PHASE %对于自编码器来说,这应该是解码阶段 negdata = 1./(1 + exp(-poshidstates*vishid‘ - repmat(visbiases,numcases,1))); %有点像求条件概率P3(RBM) h0->v1->h1(poshidstates以隐藏层单元输出的二值作为马尔科夫链的起始值,可视层第一次采样的数据? neghidprobs = 1./(1 + exp(-negdata*vishid - repmat(hidbiases,numcases,1))); negprods = negdata‘*neghidprobs;%@采样得到的可视层数据值乘以采样得到的隐藏层单元输出值得出啥? neghidact = sum(neghidprobs); negvisact = sum(negdata); %%%%%%%%% END OF NEGATIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% err= sum(sum( (data-negdata).^2 )); %h0->v1 原始输入数据跟正负采样后产生的可视层数据的差,即是求重构误差 errsum = err + errsum; if epoch>5, momentum=finalmomentum; else momentum=initialmomentum; end; %能量大小的选择跟epoch的大小有关 %%%%%%%%% UPDATE WEIGHTS AND BIASES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %求完<vh>0和<vh>1后,就可以求解权值的增量了 vishidinc = momentum*vishidinc + ... epsilonw*( (posprods-negprods)/numcases - weightcost*vishid); visbiasinc = momentum*visbiasinc + (epsilonvb/numcases)*(posvisact-negvisact); hidbiasinc = momentum*hidbiasinc + (epsilonhb/numcases)*(poshidact-neghidact); vishid = vishid + vishidinc; visbiases = visbiases + visbiasinc; hidbiases = hidbiases + hidbiasinc; %%%%%%%%%%%%%%%% END OF UPDATES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% end fprintf(1, ‘epoch %4i error %6.1f \n‘, epoch, errsum); %最后求出每个epoch的errsum end;
Training a deep autoencoder or a classifier on MNIST digits_Rbm训练(Matlab)