Model and Cost Function
1 模型概述 - Model Representation
To establish notation for future use, we’ll use
- x(i)
denote the “input” variables (living area in this example), also called input features, and
- y(i)
denote the “output” or target variable that we are trying to predict (price).
A pair (x(i),y(i)) is called a training example
the dataset that we’ll be using to learn—a list of m training examples (x(i),y(i));i=1,…,m—is called a training set.
the superscript “(i)” in the notation is simply an index into the training set, and has nothing to do with exponentiation
- X
denote the space of input values - Y
denote the space of output values
In this example
To describe the supervised learning
problem slightly more formally, our goal is,
given a training set, to learn afunction h : X → Y
so that h(x)
is a “good” predictor for the corresponding value of y.
For historical reasons, this function h
is called a hypothesis
. Seen pictorially, the process is therefore like this
- regression problem
When the target variable that we’re trying to predict iscontinuous, such as in our housing example
- classification problem
When y can take on only a small number of discrete values (such as if, given the living area, we wanted to predict if a dwelling is a house or an apartment, say)
简单的介绍了一下数据集的表示方法,并且提出来h(hypothesis),即通过训练得出来的一个假设函数,通过输入x,得出来预测的结果y。并在最后介绍了线性回归方程
2 代价函数 - Cost Function
代价函数是用来测量实际值和预测值精确度的一个函数模型.
We can measure the accuracy of our hypothesis function by using acost function
.
This takes an average difference (actually a fancier version of an average) of all the results of the hypothesis with inputs from x’s and the actual output y’s.
首先需要搞清楚假设函数和代价函数的区别
当假设函数为线性时,即线性回归方程,其由两个参数组成:theta0和theta1
我们要做的就是选取两个参数的值,使其代价函数的值达到最小化
J(θ0,θ1)=12m∑i=1m(y^i−yi)2=12m∑i=1m(hθ(xi)−yi)2
To break it apart, it is 1/2 x ̄ where x ̄ is the mean of the squares of hθ(xi)−yi , or the difference
between the predicted value and the actual value.
This function is otherwise called theSquared error function
, or Mean squared error
.
The mean is halved (1/2)as a convenience for the computation of the gradient descent, as the derivative term of the square function will cancel out the 1/2 term.
The following image summarizes what the cost function does:
3 代价函数(一)
When θ1=1, we get a slope of 1 which goes through every single data point in our model.
Conversely, when θ1=0.5, we see the vertical distance from our fit to the data points increase.
This increases our cost function to 0.58. Plotting several other points yields to the following graph:
Thus as a goal, we should try to minimize the cost function. In this case, θ1=1 is our global minimum.
4 代价函数(二)
- 等高线图是包含许多等高线的图形,双变量函数的等高线在同一条线的所有点处具有恒定值采用任何颜色并沿着’圆’,可得到相同的成本函数值
- 当θ0= 800且θ1= -0.15时,带圆圈的x显示左侧图形的成本函数的值
取另一个h(x)并绘制其等高线图,可得到以下图表
例如,在上面的绿线上找到的三个红点具有相同的J(θ0,θ1)值,因此,它们能够被沿着同一条线找到
- 当θ0= 360且θ1= 0时,等高线图中J(θ0,θ1)的值越接近中心,从而降低了成本函数误差
现在给出我们的假设函数略微正斜率可以更好地拟合数据。