二进制SVM的随机优化
指导方针
评估作业将为课程的最终成绩贡献20%。对于每个任务,标记
将给出正确使用Matlab执行任务以及您的解释和评论的信息
关于结果。每个任务在分配中将具有相同的权重。额外的2分将是
为遵守准则而给出。
一般样式该报告应为pdf文件。应该有一个包含您的姓名的标题页
和入学人数。
长度您的作业不得超过A4的四个侧面,包括标题,图形,表格,
参考文献,但不包括附录。使用在打印报告时可以阅读的标准字体
(还要注意图形标签和标题的字体)。
内容作业的主要部分应给出您执行的每个任务的结果,以及
您对这些结果的评论和结论。图和表必须包含在主
报告。
附录报告中不应包含Matlab代码,但必须将其包含在附录中。这
可以提供pdf和/或快照的代码,并且必须正确注释
提交作业应通过Vision提交。不应通过电子邮件提交。
鼓励学生与其他学生讨论使用的方法,但是您的
提交的作业必须是您自己的工作。特别是,复制作业的一部分,
无论是密谋还是评论,都是来自另一位学生的严重违纪行为。这也是违法的
允许其他学生复制您的作品,因此您的作业文件不应提供给其他人
学生。
截止日期提交截止日期为2021年3月26日;作业可能会提早提交。
逾期提交任何在提交截止日期之后但在5个工作之内提交的作业
截止日期的天数将减少30%。提交的作业
截止日期之后超过5个工作日将不会被标记。
作业分配3:二进制SVM 1/9的随机优化
奥黛丽·雷佩蒂(Audrey Repetti)博士
作业分配3
(Heriot-Watt理学硕士课程B31XN)
图1:MNIST数据集示例(http://yann.lecun.com/exdb/mnist/)。
图2:MNIST数据集示例显示了手写数字0和1。
大纲
•随机近端算法
•机器学习
•二进制分类器
1项目说明
该项目旨在说明机器学习环境中的随机近端方法。你会
需要使用Matlab实现随机梯度前后(FB)算法,以训练二进制
支持向量机(SVM)分类器。为了帮助完成此任务,您还将实施标准
FB算法。
作业分配3:二进制SVM 2/9的随机优化
奥黛丽·雷佩蒂(Audrey Repetti)博士
作业分配3
(Heriot-Watt理学硕士课程B31XN)
二进制支持向量机
二进制回归的目标是能够正确预测二进制标签(例如,是vs.否,或0
与1等)。二进制分类旨在将输入x∈R配对
N通过a到二进制输出y∈{−1,+1}
表格模型
y = dw(x)=符号
X
> wB31XN留学生程序代做、Matlab语言编程代写
</ s> </ s> </ s> </ s> </ s>
,(1)
其中>表示转置运算,而w∈R
N是需要学习的权重。恰恰,
有监督的学习任务在于找到w,以便对于给定的输入,输出将正确
由分类器预测。分类器在一对训练集中进行训练
S =
</ s> </ s> </ s> </ s> </ s>
(x`
,y`)16`6L |(∀`∈{1,..,L})x`∈R
N,y`∈{−1,+1}
。 (2)
MNIST测试集中的前21张图像的样本如图1所示。完整的MNIST
数据集可在http://yann.lecun.com/exdb/mnist/上找到,其中包含60000个大小的训练图像
28×28和10000张尺寸为28×28的测试图像,均表示介于0到9之间的手写数字。
二进制分类器将旨在为问题提供二进制答案(例如,“是”与“否”,或0与1)。
对于该数据集,可以训练分类器,例如以识别其他数字中的数字0。一个
简化版本将仅查看0和1位数字,并根据数字是否为0对其进行分类
或1.图2中给出了MNIST测试集的子样本,仅显示数字0和1。
最小化问题
学习权重向量w的经典方法是将其定义为
最小化
∈
g(w)+ h(w),(3)
其中g∈Γ0(R
N)是一个正则化函数,h∈Γ0(R
N)是损失函数。在上下文中
稀疏学习,可以选择g为1范数:
HOMEWORK ASSIGNMENT 3
(Heriot-Watt MSc Course B31XN)
HOMEWORK ASSIGNMENT 3:
STOCHASTIC OPTIMISATION FOR BINARY SVM
Guidelines
Assessment The assignment will contribute 20% to the final mark for the course. For each task, marks
will be given for correctly using Matlab to carry out the task and for your interpretation and comments
about the results. Each task will have the same weight in the assignment. Additional 2 marks will be
given for respecting the guidelines.
General style The report should be a pdf file. There should be a header page that includes your name
and matriculation number.
Length Your assignment should be NOT more than four sides of A4 including title, graphs, tables,
references, but not including the Appendix. Use a standard font that can be read when printing the report
(be also careful with the font of figure’ labels and captions).
Content The main part of your assignment should give the results for each task that you carry out and
your comments on and conclusions from these results. Figures and tables must be included in the main
report.
Appendix The report should not contain Matlab codes, but they must be included in an Appendix. The
codes can be provided as pdf and/or snapshots, and must be properly commented
Submission The assignment should be submitted through Vision. It should not be submitted by email.
Collaboration Students are encouraged to discuss the methods used with other students, but your
submitted assignment must be all your own work. In particular, copying a section of your assignment,
either plots or commentary, from another student is a serious disciplinary offence. It is also an offence to
allow another student to copy your work, so your assignment files should not be made available to other
students.
Deadline The deadline for submission is 26 March 2021; assignments may be submitted early.
Late submissions Any assignment that is submitted after the submission deadline but within 5 working
days of the deadline will be penalised by a 30% reduction in the mark. Assignments that are submitted
more than 5 working days past the deadline will not be marked.
HOMEWORK ASSIGNMENT 3: STOCHASTIC OPTIMISATION FOR BINARY SVM 1/9
Dr Audrey Repetti
HOMEWORK ASSIGNMENT 3
(Heriot-Watt MSc Course B31XN)
Figure 1: Sample of the MNIST dataset (http://yann.lecun.com/exdb/mnist/).
Figure 2: Sample of the MNIST dataset showing handwritten digits 0 and 1.
Outline
• Stochastic proximal algorithms
• Machine Learning
• Binary classifier
1 Project description
This project aims to illustrate stochastic proximal methods in the context of machine learning. You will
need to implement a stochastic gradient forward-backward (FB) algorithm with Matlab, to train a binary
support vector machine (SVM) classifier. To help with this task, you will also implement the standard
FB algorithm.
HOMEWORK ASSIGNMENT 3: STOCHASTIC OPTIMISATION FOR BINARY SVM 2/9
Dr Audrey Repetti
HOMEWORK ASSIGNMENT 3
(Heriot-Watt MSc Course B31XN)
Binary SVM
The objective in binary regression is to be able to correctly predict a binary label (e.g. yes vs. no, or 0
vs. 1, etc.). Binary classification aims to pair an input x ∈ R
N to a binary output y ∈ {−1, +1}, via a
model of the form
y = dw(x) = sign
x
>w
, (1)
where > denotes the transpose operation, and w ∈ R
N are weights that need to be learned. Precisely,
the supervised learning task consists in finding w such that for a given input, the output will be correctly
predicted by the classifier. The classifier is trained on a training set of pairs
S =
(x`
, y`)16`6L |(∀` ∈ {1, . . . , L}) x` ∈ R
N , y` ∈ {−1, +1}
. (2)
A sample of the first 21 images from the MNIST test set are shown in Figure 1. The full MNIST
dataset can be found at http://yann.lecun.com/exdb/mnist/, it contains 60000 training images of size
28 × 28, and 10000 testing images of size 28 × 28, all representing handwritten digits between 0 and 9.
A binary classifier will aim to provide a binary answer to a question (e.g. “yes” vs. “no”, or 0 vs. 1).
For this data set, the classifier could be trained, e.g., to recognise the digit 0 among the other digits. A
simplified version would be to only look at 0 and 1 digits, and classify the digits depending if they are 0
or 1. A sub-sample of the MNIST test set is given in Figure 2, only showing digits 0 and 1.
Minimisation problem
A classical approach for learning the vector of weights w is to define it as a solution to
minimise
w∈RN
g(w) + h(w), (3)
where g ∈ Γ0(R
N ) is a regularisation function, and h ∈ Γ0(R
N ) is the loss function. In the context of
sparse learning, g can be chosen to be an `1 norm:
(∀w ∈ R
N ) g(w) = λkwk1, (4)
where λ > 0. Regarding h, a classical choice is the Huber loss. Let
(∀w ∈ R
N ) h(w) = 1
L
X
L
`=1
φ(y` − x
>
` w), (5)
with φ ∈ Γ0(R) is the Huber function defined as
(∀u ∈ R) φ(u) = (
1
2
u
2
if |u| 6 δ,
δ
|u| − 1
2
δ
otherwise.
(6)
Matlab toolbox
The file Main.m contains the main commands to implement the binary classifier.
HOMEWORK ASSIGNMENT 3: STOCHASTIC OPTIMISATION FOR BINARY SVM 3/9
Dr Audrey Repetti
HOMEWORK ASSIGNMENT 3
(Heriot-Watt MSc Course B31XN)
Datasets. In this project, you will handle two datasets, both obtained from the MNIST dataset.
• dataset subMNIST.mat: Dataset that only contains digits 0 and 1.
• dataset MNIST.mat: Full MNIST dataset.
The variables X test and X train are tensors of dimension Nx × Ny × I and Nx × Ny × L, respectively,
where Nx = Ny = 28 and I (resp. L) is the total number of images contained in the testing set (resp.
training set). For instance, the i-th image in X test is given by X test(:,:,i). The variables Y test and
Y train are vectors of dimension L containing labels in {−1, +1}. The datasets have been preprocessed
such that the labels correspond to +1 if the associated the digit is 0, and the labels are −1 if the associated
digit is not 0. For instance, for the dataset dataset subMNIST.mat, the first digit in X test (i.e.
X test(:,:,1)) is 1, so the the first coefficient in Y test (i.e. Y test(1)) is −1. The second digit X test(:,:,2)
is 0, so the the second coefficient Y test(2) is +1.
Binary classifier model. You will aim to train a classifier to determine whether a hand-written digit is
a 0 or not. You will use a linear model of the form (1). This model in Matlab is implemented as follows:
where x is the image to be tested to check whether it is the digit 0 or not, and w is the variable that
needs to be trained.
In order to test the full training set with the learned variable w, the Matlab function d test will be
used, defined as follows:
To determine the percentage of errors done by the classifier, one can compute
E(w) = 100 ×
1
2
PI
i=1 |dw(xi) − yi
|
I
, (7)
where (xi)16i6I are the testing images, and (yi)16i6I are the associated labels (= +1 if the associated
digit is a 0, and = −1 if it is not a 0).
Additional information. Further details are provided below.
• op norm.m: val = op norm(A, At, im size)
– Compute the squared spectral norm of operator A
– A and At: operator and its adjoint. Must be defined as functions.
– im size: size of the input, e.g. if A applies to a variable of size [Nx,Ny], then im size=[Nx,Ny].
• huber.m: h = huber(x, delta)
– Compute the Huber function φ defined in (6).
– x can be either a scalar or a vector.
HOMEWORK ASSIGNMENT 3: STOCHASTIC OPTIMISATION FOR BINARY SVM 4/9
Dr Audrey Repetti
HOMEWORK ASSIGNMENT 3
(Heriot-Watt MSc Course B31XN)
– delta: scalar value corresponding to the parameter δ in (6).
– If x is a vector of size L, then the output of the function is a vector of size L as well.
• huber grad: g=huber grad(x,delta)
– Compute the gradient of the Huber function φ defined in (6).
– x can be either a scalar or a vector.
– delta: scalar value corresponding to the parameter δ in (6).
– If x is a vector of size L, then the output of the function is a vector of size L as well.
– This function must be updated.
• TO BE COMPLETED: The parts of the code to be completed (or changed) are highlighted with
comments, e.g.:
2 Tasks
1. Before designing a stochastic gradient forward-backward to train a classifier, you will first build
a classical forward-backward algorithm to solve (3) (without stochastic gradient). The questions
below will drive you for this task.
(a) [1 Mark]
Let X = (x`)16`6L ∈ R
N×L be the matrix containing in the `-th column the input x`
. Let
Y = (y`)16`6L ∈ R
L.
Show that
(∀w ∈ R
N ) h(w) = 1
L
ϕ(Y − X
>w) (8)
where ϕ: R
L → R is defined as
(∀u = (u`)16`6L ∈ R
L
) ϕ(u) = X
L
`=1
φ(u`), (9)
with φ defined in (6).
(b) [2 Mark]
What is the gradient of function h? What is the Lipschitz constant β of its gradient?
(c) [1 Mark]
What is the proximity operator of function g?
(d) [1 Mark]
Give the FB iterations to solve (3).
HOMEWORK ASSIGNMENT 3: STOCHASTIC OPTIMISATION FOR BINARY SVM 5/9
Dr Audrey Repetti
HOMEWORK ASSIGNMENT 3
(Heriot-Watt MSc Course B31XN)
2. [2 Marks]
You will now implement the FB algorithm with MATLAB.
• In file main.m, compute compute the Lipschitz constant of h:
• In file FB.m update the first lines corresponding to the different parameter, functions and
operators involved to run the FB algorithm to solve (3):
– gamma: step-size for FB
– g and h: compute functions g and h given in equations (4) and (5), respectively.
– grad =@(w): gradient of the smooth function, computed at w. To compute this gradient,
you also need to update the file huber grad.m, with function g=huber grad(x,delta).
– prox =@(w, T): proximity operator of the proximable function, computed at w, with
parameter T.
• In file FB.m update the FB steps:
3. [2 Marks]
Run the FB algorithm to solve problem (3), with both the two MNIST datasets. For each dataset,
comment on the results, and give a figure showing the percentage of error of the classifier obtained
during the training process, as a function of time.
4. For this task, you will aim to build a stochastic gradient forward-backward (SGFB) algorithm to
solve (3).
HOMEWORK ASSIGNMENT 3: STOCHASTIC OPTIMISATION FOR BINARY SVM 6/9
Dr Audrey Repetti
HOMEWORK ASSIGNMENT 3
(Heriot-Watt MSc Course B31XN)
(a) [1 Mark]
For every k ∈ N, let Lk ⊂ {1, . . . , D}, and Lk = ] Lk. Let XLk = (x`)`∈Lk ∈ R
Lk×N and
YLk = (y`)`∈Lk ∈ R
Lk .
In the context of SGFB, give the approximated gradient of h, only involving the functions
and operators associated with indices Lk.
(b) [1 Mark]
Give the SGFB iterations to solve problem (3), assuming that, at each iteration k ∈ N, a
subset Dk ⊂ {1, . . . , D} of the D functions is randomly selected.
(c) [1 Mark]
State clearly the convergence conditions.
5. [2 Marks]
You will now implement the SGFB algorithm with MATLAB.
• In file FB sto.m, update the first lines corresponding to the different parameters, functions,
and operators involved to run the SGFB algorithm to solve (3):
They are all the same as for FB, apart from the computation of the approximated gradient,
that should only involve the functions/operators associated with the selected indices Ind.
• In file FB sto.m update the FB steps:
Additional information:
• In file Main.m, you will need to choose the parameters p and theta:
HOMEWORK ASSIGNMENT 3: STOCHASTIC OPTIMISATION FOR BINARY SVM 7/9
Dr Audrey Repetti
HOMEWORK ASSIGNMENT 3
(Heriot-Watt MSc Course B31XN)
– p: parameter with values in ]0, 1], corresponding to the proportion of indices d ∈
{1, . . . , D} that are randomly selected at each iteration.
– theta: parameter to control convergence speed that must appear in the choice of the
step-size.
Remark: In this task, the main difficulty is to build the approximated gradient, hence finding a
suitable Matlab syntax is part of the question.
6. [2 Marks]
Run the SGFB algorithm for the MNIST dataset only containing 0 and 1 digits, considering the
following parameters: p ∈ {1‰, 1%, 10%, 50%} and θ ∈ {0.6, 0.8, 1}.
Give tables (see below) with (i) the percentage error when applying the learned classifier to the test
sets, (ii) the total time necessary to train the classifier, and (iii) the final objective value of h + g.
You may want to run the algorithm multiple times to take an average of both percentage error and
convergence time.
Remark: If you prefer, you can provide figures.
Percentage of error
θ \ p 1‰ 1% 10% 50%
0.6
0.8
1
Convergence time
θ \ p 1‰ 1% 10% 50%
0.6
0.8
1
Final objective value
θ \ p 1‰ 1% 10% 50%
0.6
0.8
1
7. [2 Marks]
Comment on the Matlab results obtained for the previous questions.
Based on your comments, choose the best values for p and θ and train the classifier with the SGFB
algorithm on the full MNIST dataset. Comment your results.
Remark: The problem being convex, the minimum objective value should be the same for all
methods. It may happen that you observe that it is not the case. This might be due to the considered
stopping criterion.
Keeping this in mind, you can use the results obtained from the classical FB algorithm as ground
truth results.
Do not change the stopping criterion, only acknowledge this fact if you observe it in your simulations.
[Additional 2 Marks] If guidelines are respected.
HOMEWORK ASSIGNMENT 3: STOCHASTIC OPTIMISATION FOR BINARY SVM 8/9
Dr Audrey Repetti
HOMEWORK ASSIGNMENT 3
(Heriot-Watt MSc Course B31XN)
Matlab help
To obtain average values in Task 6, one can use loops in Matlab to automatically change parameters and
run multiple times an algorithm:
HOMEWORK ASSIGNMENT 3: STOCHASTIC OPTIMISATION FOR BINARY SVM 9/9
请加QQ:99515681 或邮箱:99515681@qq.com WX:codehelp