网上论坛
发布回复
|
Weights and Regularization
4 名作者发布了 29 个帖子
|
b_m...@live.com |
13-10-10
|
Here is the basic script I am using (I am using a data set from UCI regarding wine ratings).
####CODE#####################################################################
# create hidden layer with 5 nodes, init weights in range -0.1 to 0.1 and add
# a bias with value 1
hidden_layer = mlp.Sigmoid(layer_name='hidden1', dim=5, irange=.1, init_bias=1.)
# create hidden layer with 2 nodes, init weights in range -0.1 to 0.1 and add
# a bias with value 1
hidden_layer2 = mlp.Sigmoid(layer_name='hidden2', dim=2, irange=.1, init_bias=1.)
# create Softmax output layer
output_layer = mlp.Softmax(2, 'output', irange=.1)
# create Stochastic Gradient Descent trainer that runs for x epochs
trainer = sgd.SGD(learning_rate=.05, batch_size=100, termination_criterion=EpochCounter(200))
layers = [hidden_layer,hidden_layer2,output_layer] #according to the code, the last layer will be considered the output
# create neural net that takes two inputs
ann = mlp.MLP(layers, nvis=11)
trainer.setup(ann, ds)
# train neural net until the termination criterion is true
while True:
trainer.train(dataset=ds)
ann.monitor.report_epoch()
ann.monitor()
if not trainer.continue_learning(ann):
break
####END CODE####################################################################
My questions:
I. Weights. How do I see the weights from the trained model? I *think* I am adding a second hidden layer above but if I looked at ann.get_weights() the dimension of this resulting object does not change if I remove the second hidden layer. So I question if I am looking at the right thing. Ultimately I want to see the finished weights so (outside pylearn) I can visualize the network.
II. Regularization. How to use regularization? Specifically, how to adjust the above code to use 1) drop out and then 2) L2 norm?
Thanks!
Brian
b_m...@live.com |
13-10-12
|
Through the ann.get_param_values() call I am now able to see the weight and bias values and through knowledge of the net architecture, accomplish question #1.
I would still like to get some quick help on how to use regularization (especially dropout) and then how to predict new cases with such a model (ann.fprop(theano.shared(testMatrix, name='test')).eval() call still work?).
Thanks!
Kyle Kastner |
13-10-12
|
Kyle
- 显示引用文字 ---
You received this message because you are subscribed to the Google Groups "pylearn-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pylearn-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
b_m...@live.com |
13-10-13
|
> I am doing something similar, and had to enable the one_hot=True to recreate the MNIST yaml results in python. What is the error you are getting?
>
> Kyle
>
.
Kyle,
I am not getting at error, instead I am looking to learn/confirm the proper method to train a MLP using regularization (L2 as well as dropout) and then get predictions on a new data set. I am not using yaml though i want a way to use pylearn2 directly in python using its functions.
I referenced a blog that showed how to train a MLP w/o regularization (only number of epochs) and then predict new data using ann.fprop where ann is the trained MLP. I *think* I can use drop out simply by adding the call into SGD like this:
sgd.SGD(learning_rate=.05, batch_size=100, termination_criterion=EpochCounter(200), cost=Dropout())
and then to predict new data I *think* i just need to call dropout_fprop instead of fprop. Like this (where X_s is the new test set).
test_preds=ann.dropout_fprop(theano.shared(X_s, name='test')).eval()
But I am hoping one of the developers will confirm this is correct and explain how to add a L2 penalty, as that is escaping me currently. I am not very experienced with Python yet so following the code is a challenge.
Kyle Kastner |
13-10-14
|
I am unsure about the need for an l2 penalty in addition to dropout, as dropout is already a very strong regularizer... what is driving the need for l2 regularization?
There is an LxReg class in the cost.py file - using that could give something useful. See https://github.com/lisa-lab/pylearn2/issues/273 for more details. I haven't used it, though, so I can't give much guidance beyond the link.
Kyle
- 显示引用文字 ---
You received this message because you are subscribed to the Google Groups "pylearn-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pylearn-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
b_m...@live.com |
13-10-14
|
> I don't know that you need to call dropout_fprop for the predictions - once the network is trained, a regular fprop should be all you need, as the model averaging is done during training - the fprop output of a dropout net *should* represent the bagged estimate of many neural nets. I am having trouble finding the reference, but I am recalling that from somewhere. Maybe some one else can help/contradict me here?
>
>
>
>
>
> In my code, I have called Dropout with an additional dictionary of parameters, so that the dropout from the visible layer is .8, while the others remain .5, as is recommended in some of the literature. The default value of .5 dropout should be OK though, so cost=Dropout() seems ok to me.
>
>
> I am unsure about the need for an l2 penalty in addition to dropout, as dropout is already a very strong regularizer... what is driving the need for l2 regularization?
>
> There is an LxReg class in the cost.py file - using that could give something useful. See https://github.com/lisa-lab/pylearn2/issues/273 for more details. I haven't used it, though, so I can't give much guidance beyond the link.
>
>
>
>
> Kyle
>
Hey Kyle,
I. I see this description of dropout_fprop from models/mlp.py so I am not sure:
def dropout_fprop(self, state_below, default_input_include_prob=0.5,
input_include_probs=None, default_input_scale=2.,
input_scales=None, per_example=True):
"""
state_below: The input to the MLP
Returns the output of the MLP, when applying dropout to the input and intermediate layers.
II. regarding L2, I would not be using both, just want to see how to do it as another option.
I saw that class. I also am thinking that here https://github.com/lisa-lab/pylearn2/blob/master/pylearn2/costs/mlp/__init__.py
there is this:
class WeightDecay(Cost):
"""
coeff * sum(sqr(weights))
for each set of weights.
"""
def __init__(self, coeffs):
"""
coeffs: a list, one element per layer, specifying the coefficient
to multiply with the cost defined by the squared L2 norm of the weights
for each layer.
and this
class L1WeightDecay(Cost):
"""
coeff * sum(abs(weights))
for each set of weights.
"""
def __init__(self, coeffs):
"""
coeffs: a list, one element per layer, specifying the coefficient
to multiply with the cost defined by the L1 norm of the
weights(lasso) for each layer.
which might be the way to go for L1 and L2 reg.
Kyle Kastner |
13-10-14
|
Kyle
- 显示引用文字 ---
You received this message because you are subscribed to the Google Groups "pylearn-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pylearn-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
b_m...@live.com |
13-10-14
|
I could not figure out how to use the weightdecay class in my code (I am not using yaml). I tried this with no success
trainer = sgd.SGD(learning_rate=0.005, batch_size=100,monitoring_dataset={ 'test': dt_test }, termination_criterion=EpochCounter(5000),cost=WeightDecay(coeffs=[0.005,0.005,0.005]))
With yaml are you able to make predictions on a new dataset (and get the probabilities and not just the predicted class).
On Monday, October 14, 2013 9:28:46 AM UTC-4, Kyle Kastner wrote:
> As far as dropout_fprop goes, I think that description matchesmy thoughts. You use dropout_fprop during the training stage, to apply dropout at each layer, which *effectively* creates many separate neural networks, each trained on one example. Then, once the training is all done, you can use a regular fprop, which will *effectively* give you the bagged decision result from all of these networks, by making a decision using all of the weights (see http://arxiv.org/pdf/1207.0580.pdf)
>
>
>
> In short, I think that dropout_fprop is largely internal/used during training - while fprop is used for predictions with a trained net.
>
>
> I did not see the WeightDecay/L1WeightDecay classes - I agree that those seem like the way to go. If I can get those working in my own code I will let you know.
>
>
> Kyle
>
>
>
Ian Goodfellow |
13-10-15
|
and a WeightDecay cost. If you train with only a WeightDecay cost it
will just make the weights go to 0.
You indeed need to use dropout_fprop at train time and regular fprop
at test time. Training using the Dropout cost will handle the calls to
dropout_fprop for you.
b_m...@live.com |
13-10-15
|
> You need to use a SumOfCosts class that adds together a Dropout cost
>
> and a WeightDecay cost. If you train with only a WeightDecay cost it
>
> will just make the weights go to 0.
>
>
>
> You indeed need to use dropout_fprop at train time and regular fprop
>
> at test time. Training using the Dropout cost will handle the calls to
>
> dropout_fprop for you.
>
>
>
Ian,
I. So, cost=Dropout() in the sgd call takes care of dropout and then using just fprop in the prediction of new the test set?
II. How do you just use L1 or L2 regularization without dropout? Do I need to somehow add the L1 or L2 weight decay to the log lik?
b_m...@live.com |
13-10-15
|
>
> Ian,
>
> I. So, cost=Dropout() in the sgd call takes care of dropout and then using just fprop in the prediction of new the test set?
>
> II. How do you just use L1 or L2 regularization without dropout? Do I need to somehow add the L1 or L2 weight decay to the log lik?
I mean for I. that a user doesn't ever call dropout_fprop directly correct, just ass the cost=Dropout() call into sgd?
Ian Goodfellow |
13-10-15
|
II. Yes, the SumOfCosts class does the addition.
b_m...@live.com |
13-10-15
|
> I. Yes. Yes to your follow up question 2.
>
> II. Yes, the SumOfCosts class does the addition.
>
Thanks! Last follow-up, how would I actually accomplish II?
I tried this but receive an NotImplementedError.
trainer = sgd.SGD(learning_rate=0.005, batch_size=100,monitoring_dataset={ 'test': dt_test }, termination_criterion=EpochCounter(5000),cost=SumOfCosts(costs=[WeightDecay(coeffs=[0.005,0.005,0.005])]))
Ian Goodfellow |
13-10-15
|
b_m...@live.com |
13-10-15
|
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "C:\Anaconda\lib\site-packages\pylearn2-0.1dev-py2.7.egg\pylearn2\training_algorithms\sgd.py", line 314, in train
"data_specs: %s" % str(data_specs))
NotImplementedError: Unable to train with SGD, because the cost does not actually use data from the data set. data_specs: (CompositeSpace(), ())
I can post the entire script (it is a simple 2 hidden layer mlp) if need be.
Ian Goodfellow |
13-10-15
|
b_m...@live.com |
13-10-15
|
import theano
from pylearn2.models import mlp
from pylearn2.training_algorithms import sgd
from pylearn2.termination_criteria import MonitorBased, EpochCounter
from pylearn2.costs.mlp.dropout import Dropout
from pylearn2.costs.cost import SumOfCosts, MethodCost
from pylearn2.models.mlp import WeightDecay, L1WeightDecay
from pylearn2.datasets.dense_design_matrix import DenseDesignMatrix
import numpy as np
from random import randint
from sklearn.metrics import confusion_matrix, roc_auc_score, accuracy_score
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import Binarizer
import pandas as pd
X=np.loadtxt(open("C:\Users\Desktop\pylearn2\wine.csv"), delimiter=';',usecols=range(0, 11), skiprows=1) #first 11 cols
X=np.array(X)
y=np.loadtxt(open("C:\Users\Desktop\pylearn2\wine.csv"), delimiter=';',usecols=(11,12), skiprows=1)
y=np.array(y)
#train
X_t=X[:3000,:]
y_t=y[:3000,:]
#valid
X_v=X[2500:3000,:]
y_v=y[2500:3000,:]
#test
X_s=X[3000:,:]
y_s=y[3000:,:]
#center and scale inputs
scaler=StandardScaler()
scaler.fit(X_t)
X_t=scaler.transform(X_t)
X_v=scaler.transform(X_v)
X_s=scaler.transform(X_s)
class datMake(DenseDesignMatrix): #inherits from DenseDesignMatrix
def __init__(self,X,y):
super(datMake, self).__init__(X=X, y=y)
dt_train=datMake(X_t,y_t)
dt_valid=datMake(X_v,y_v)
dt_test=datMake(X_s,y_s)
# a bias with value 1
hidden_layer = mlp.Sigmoid(layer_name='hidden1', dim=5, irange=.1, init_bias=1.)
# a bias with value 1
hidden_layer2 = mlp.Sigmoid(layer_name='hidden2', dim=2, irange=.1, init_bias=1.)
output_layer = mlp.Softmax(2, 'output', irange=.1)
trainer = sgd.SGD(learning_rate=0.005, batch_size=100,monitoring_dataset={ 'test': dt_test }, termination_criterion=EpochCounter(5000),cost=SumOfCosts(costs=[WeightDecay(coeffs=[0.005,0.005,0.005])])) #epoch is complete run through the data. if the training set is 2000 records and the batch size is 100, there are two batches in an epoch
# create neural net that takes 11 inputs
ann = mlp.MLP(layers, nvis=11)
trainer.setup(ann, dt_train)
while True:
trainer.train(dataset=dt_train)
ann.monitor()
if not trainer.continue_learning(ann):
break
ann.get_params()
ann.get_param_values()
#predict the test set
test_preds=ann.fprop(theano.shared(X_s, name='test')).eval()
Ian Goodfellow |
13-10-15
|
b_m...@live.com |
13-10-15
|
I placed the file here: https://docs.google.com/file/d/0B9dsnio60wRoRHptdHlTZjk2RU0/edit?usp=sharing
thanks Ian!
Pascal Lamblin |
13-10-15
|
There is only "WeightDecay" in the SumOfCost. As Ian said, this would
simply put all weights to zero.
You need to have at least a cost that actually depends on the data, such
as DropoutCost, CrossEntropy or NegativeLogLikelihood. You can also
use a MethodCost to specify a method of the model to call, and use the
return expression as the cost.
--
Pascal
Ian Goodfellow |
13-10-15
|
Brian Miner |
13-10-15
|
Can you give an example? How to change this:
to incorporate the standard loss function used by the output layer?
Thanks!
Ian Goodfellow |
13-10-16
|
b_m...@live.com |
13-10-16
|
> Just put a second cost in the list. Like Dropout() or something.
>
>
What i am struggling with and perhaps just did not explain well enough is how to add the weight decay to the default cost that results from a call to sgd without the cost parameter added at all. I don't want to combine weight decay with dropout. I want the output layer to dictate the cost, to which to add the weight decay term.
For example, this call
sgd.SGD(learning_rate=0.005,batch_size=100,termination_criterion=EpochCounter(5000))
has some default cost. I expect it is the negative log lik derived from the choice of output layer.
So, my question is simply what to do to add this cost to the weight decay (within SumOfCosts). There is a NegativeLogLikelihood in supervised_cost but that seems to be depreciated.
Thanks for the time!
Ian Goodfellow |
13-10-16
|
https://github.com/lisa-lab/pylearn2/blob/master/pylearn2/costs/mlp/dropout.py#L62
All it does is compute that cost with the hidden states multiplied by
2 * dropout mask.
If you don't want dropout, then use costs.mlp.Default:
https://github.com/lisa-lab/pylearn2/blob/master/pylearn2/costs/mlp/__init__.py#L11
That will also make the last layer drive the cost.
Most of the layers implement some kind of negative log likelihood as their cost.
The NegativeLogLikelihood cost has been deprecated because it's only
the negative log likelihood for a specific model (maybe softmax? I
haven't looked at it recently) so it doesn't make sense to apply it to
other models.
Pascal Lamblin |
13-10-16
|
On Tue, Oct 15, 2013, Brian Miner wrote:
> Can you give an example? How to change this:
>
> trainer = sgd.SGD(learning_rate=0.005, batch_size=100,monitoring_dataset={ 'test': dt_test }, termination_criterion=EpochCounter(5000),cost=SumOfCosts(costs=[WeightDecay(coeffs=[0.005,0.005,0.005])]))
>
> to incorporate the standard loss function used by the output layer?
For an MLP, I think you can use:
SumOfCosts(costs=[
MethodCost("cost_from_X"),
WeightDecay(coeffs=[...])])
MethodCost is defined in costs/cost.py
>
> Thanks!
>
>
>
>
>
> On 10/15/2013 10:44 AM, Pascal Lamblin wrote:
> >On Mon, Oct 14, 2013, b_m...@live.com wrote:
> >>On Monday, October 14, 2013 8:35:11 PM UTC-4, Ian Goodfellow wrote:
> >>>I. Yes. Yes to your follow up question 2.
> >>>
> >>>II. Yes, the SumOfCosts class does the addition.
> >>Thanks! Last follow-up, how would I actually accomplish II?
> >>
> >>I tried this but receive an NotImplementedError.
> >>
> >>trainer = sgd.SGD(learning_rate=0.005, batch_size=100,monitoring_dataset={ 'test': dt_test }, termination_criterion=EpochCounter(5000),cost=SumOfCosts(costs=[WeightDecay(coeffs=[0.005,0.005,0.005])]))
> >There is only "WeightDecay" in the SumOfCost. As Ian said, this would
> >simply put all weights to zero.
> >
> >You need to have at least a cost that actually depends on the data, such
> >as DropoutCost, CrossEntropy or NegativeLogLikelihood. You can also
> >use a MethodCost to specify a method of the model to call, and use the
> >return expression as the cost.
> >
>
> --
> You received this message because you are subscribed to the Google Groups "pylearn-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pylearn-user...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
--
Pascal
Ian Goodfellow |
13-10-16
|
thing, without needing to write cost_from_X in the base script.
Brian |
13-10-16
|
Thank you Ian and Pascal! These did the trick. Is "cost_from_X" another
way of using the default (outer-layer dependent) cost function (without
assuming MLP)?
Ian Goodfellow |
13-10-16
|
based on calling a method that you name, so if you use MethodCost and
tell it to call cost_from_X it does the same thing as Default.
随机推荐
-
【css】多行文字图片混排容器内垂直居中解决方案
css: .box-wrap{display:table;width:200px;height:200px;*position:relative;}/*最外边的容器,需固定宽高*/ .box-ha ...
-
c# WinForm加载焦点
1.c# WinForm在加载时把焦点设在按钮上 this.AcceptButton = button1; 这样在WinForm窗口中, 按钮的状态会变成窗口的默认按钮, 只要按下Enter键,就会触 ...
-
Java Web动态配置log4j
导入log4j的jar包, 在web.xml中做如下配置 <!-- Log4j Configuration --> <context-param> <param-name ...
-
XML跨平台,你懂的
XML跨平台,你懂的 [引子] 90后小妹,问我,"都说XML跨平台,偶真的,不理解.XML语言的这大优势,倒是深深记在脑海里了." 当然,偶立马应声答到,& ...
-
sed-加速你在Linux的文件编辑
1. Sed简介 sed是一种在线编辑器,它一次处理一行内容.处理时,把当前处理的行存储在临时缓冲区中,称为"模式空间"(pattern space),接着用sed命令处理缓冲区中 ...
-
面试前必知Redis面试题—缓存雪崩+穿透+缓存与数据库双写一致问题
今天来分享一下Redis几道常见的面试题: 如何解决缓存雪崩? 如何解决缓存穿透? 如何保证缓存与数据库双写时一致的问题? 一.缓存雪崩 1.1什么是缓存雪崩? 回顾一下我们为什么要用缓存(Redis ...
-
安卓中location.href或者location.reload 不起作用
链接:https://www.cnblogs.com/joshua317/p/6163471.html 在移动wap中,经常会使用window.location.href去跳转页面,这个方法在绝大多数 ...
-
2018-2019-2 《网络对抗技术》Exp0 Kali安装 Week1 20165301
2018-2019-2 <网络对抗技术>Exp0 Kali安装 Week1 20165301 安装kali 参考此网站 设置共享文件夹 虚拟机->设置->选项->共享文件 ...
-
bzoj 2209 括号序列
反转操作 + 翻转操作 = 对称操作 因为上面三个操作都是自己的逆操作,所以我们只需要实现对称操作和反转操作,就可以搞定翻转操作. #include <cstdio> #include & ...
-
C++11 constexpr使用
C++11为了提高代码执行效率做了一些改善.这种改善之一就是:生成常量表达式,允许程序利用编译时的计算能力.假如你熟悉模板元编程,你将发现constexpr使这一切变得更加简单.constexpr使我 ...