机器学习算法进阶学习笔记——回归算法
线性回归
最大似然估计MLE
y
(
i
)
=
θ
T
x
(
i
)
+
ε
(
i
)
y^{(i)}=\theta^{T} x^{(i)}+\varepsilon^{(i)}
y(i)=θTx(i)+ε(i)
p
(
ϵ
(
i
)
)
=
1
2
π
σ
exp
(
−
(
ϵ
(
i
)
)
2
2
σ
2
)
p\left(\epsilon^{(i)}\right)=\frac{1}{\sqrt{2 \pi} \sigma} \exp \left(-\frac{\left(\epsilon^{(i)}\right)^{2}}{2 \sigma^{2}}\right)
p(ϵ(i))=2π
σ1exp(−2σ2(ϵ(i))2)
p
(
y
(
i
)
∣
x
(
i
)
;
θ
)
=
1
2
π
σ
exp
(
−
(
y
(
i
)
−
θ
T
x
(
i
)
)
2
2
σ
2
)
p\left(y^{(i)} \mid x^{(i)} ; \theta\right)=\frac{1}{\sqrt{2 \pi} \sigma} \exp \left(-\frac{\left(y^{(i)}-\theta^{T} x^{(i)}\right)^{2}}{2 \sigma^{2}}\right)
p(y(i)∣x(i);θ)=2π
σ1exp(−2σ2(y(i)−θTx(i))2)
L
(
θ
)
=
∏
i
=
1
m
p
(
y
(
i
)
∣
x
(
i
)
;
θ
)
L(\theta)=\prod_{i=1}^{m} p\left(y^{(i)} \mid x^{(i)} ; \theta\right)
L(θ)=∏i=1mp(y(i)∣x(i);θ)
=
∏
i
=
1
m
1
2
π
σ
exp
(
−
(
y
(
i
)
−
θ
T
x
(
i
)
)
2
2
σ
2
)
=\prod_{i=1}^{m} \frac{1}{\sqrt{2 \pi} \sigma} \exp \left(-\frac{\left(y^{(i)}-\theta^{T} x^{(i)}\right)^{2}}{2 \sigma^{2}}\right)
=∏i=1m2π
σ1exp(−2σ2(y(i)−θTx(i))2)
高斯对数似然与最小二乘
ℓ ( θ ) = log L ( θ ) = log ∏ i = 1 m 1 2 π σ exp ( − ( y ( i ) − θ T x ( i ) ) 2 2 σ 2 ) = ∑ i = 1 m log 1 2 π σ exp ( − ( y ( i ) − θ T x ( i ) ) 2 2 σ 2 ) = m log 1 2 π σ − 1 σ 2 ⋅ 1 2 ∑ i = 1 m ( y ( i ) − θ T x ( i ) ) 2 J ( θ ) = 1 2 ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 \begin{aligned} \ell(\theta) &=\log L(\theta) \\ &=\log \prod_{i=1}^{m} \frac{1}{\sqrt{2 \pi} \sigma} \exp \left(-\frac{\left(y^{(i)}-\theta^{T} x^{(i)}\right)^{2}}{2 \sigma^{2}}\right) \\ &=\sum_{i=1}^{m} \log \frac{1}{\sqrt{2 \pi} \sigma} \exp \left(-\frac{\left(y^{(i)}-\theta^{T} x^{(i)}\right)^{2}}{2 \sigma^{2}}\right) \\ &=m \log \frac{1}{\sqrt{2 \pi} \sigma}-\frac{1}{\sigma^{2}} \cdot \frac{1}{2} \sum_{i=1}^{m}\left(y^{(i)}-\theta^{T} x^{(i)}\right)^{2} \\ & J(\theta)=\frac{1}{2} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right)^{2} \end{aligned} ℓ(θ)=logL(θ)=logi=1∏m2π σ1exp(−2σ2(y(i)−θTx(i))2)=i=1∑mlog2π σ1exp(−2σ2(y(i)−θTx(i))2)=mlog2π σ1−σ21⋅21i=1∑m(y(i)−θTx(i))2J(θ)=21i=1∑m(hθ(x(i))−y(i))2
θ \theta θ的解析式的求解过程
将M个N维样本组成矩阵X:
X的每一行对应一个样本, 共M个样本(measurements)
X的每一列对应样本的一个维度,共N维(regressors)
还有额外的一维常数项,全为1
目标函数
J
(
θ
)
=
1
2
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
2
=
1
2
(
X
θ
−
y
)
T
(
X
θ
−
y
)
\quad J(\theta)=\frac{1}{2} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right)^{2}=\frac{1}{2}(X \theta-y)^{T}(X \theta-y)
J(θ)=21∑i=1m(hθ(x(i))−y(i))2=21(Xθ−y)T(Xθ−y)
梯度
:
∇
θ
J
(
θ
)
=
∇
θ
(
1
2
(
X
θ
−
y
)
T
(
X
θ
−
y
)
)
=
∇
θ
(
1
2
(
θ
T
X
T
−
y
T
)
(
X
θ
−
y
)
)
: \quad \nabla_{\theta} J(\theta)=\nabla_{\theta}\left(\frac{1}{2}(X \theta-y)^{T}(X \theta-y)\right)=\nabla_{\theta}\left(\frac{1}{2}\left(\theta^{T} X^{T}-y^{T}\right)(X \theta-y)\right)
:∇θJ(θ)=∇θ(21(Xθ−y)T(Xθ−y))=∇θ(21(θTXT−yT)(Xθ−y))
=
∇
θ
(
1
2
(
θ
T
X
T
X
θ
−
θ
T
X
T
y
−
y
T
X
θ
+
y
T
y
)
)
=
1
2
(
2
X
T
X
θ
−
X
T
y
−
(
y
T
X
)
T
)
=
X
T
X
θ
−
X
T
y
⟶
0
(
求
驻
点
)
\begin{array}{l} =\nabla_{\theta}\left(\frac{1}{2}\left(\theta^{T} X^{T} X \theta-\theta^{T} X^{T} y-y^{T} X \theta+y^{T} y\right)\right) \\ =\frac{1}{2}\left(2 X^{T} X \theta-X^{T} y-\left(y^{T} X\right)^{T}\right)=X^{T} X \theta-X^{T} y \longrightarrow \text 0(求驻点) \end{array}
=∇θ(21(θTXTXθ−θTXTy−yTXθ+yTy))=21(2XTXθ−XTy−(yTX)T)=XTXθ−XTy⟶0(求驻点)
最小二乘法意义下的参数最优解
参数的解析式
θ = ( X T X ) − 1 X T y \theta=\left(X^{T} X\right)^{-1} X^{T} y θ=(XTX)−1XTy
若 X T X X^{T} X XTX不可逆或防止过拟合,增加 λ \lambda λ扰动
θ = ( X T X + λ I ) − 1 X T y \theta=\left(X^{T} X+\lambda I\right)^{-1} X^{T} y θ=(XTX+λI)−1XTy
简便”方法记忆结论
X θ = y ⇒ X T X θ = X T y ⇒ θ = ( X T X ) − 1 X T y \begin{array}{l} X \theta=y \Rightarrow X^{T} X \theta=X^{T} y \\ \Rightarrow \theta=\left(X^{T} X\right)^{-1} X^{T} y \end{array} Xθ=y⇒XTXθ=XTy⇒θ=(XTX)−1XTy
正则与防止过拟合
正则项与防上过拟合
{
λ
>
0
ρ
∈
[
0
,
1
]
\left\{\begin{array}{l}\lambda>0 \\ \rho \in[0,1]\end{array}\right.
{λ>0ρ∈[0,1]
□
\square
□ L2-norm:
J
(
θ
⃗
)
=
1
2
∑
i
=
1
m
(
h
θ
⃗
(
x
(
i
)
)
−
y
(
i
)
)
2
+
λ
∑
j
=
1
n
θ
j
2
\quad J(\vec{\theta})=\frac{1}{2} \sum_{i=1}^{m}\left(h_{\vec{\theta}}\left(x^{(i)}\right)-y^{(i)}\right)^{2}+\lambda \sum_{j=1}^{n} \theta_{j}^{2}
J(θ
)=21∑i=1m(hθ
(x(i))−y(i))2+λ∑j=1nθj2
□
L
1
\square \mathrm{L} 1
□L1 -norm :
J
(
θ
⃗
)
=
1
2
∑
i
=
1
m
(
h
θ
⃗
(
x
(
i
)
)
−
y
(
i
)
)
2
+
λ
∑
j
=
1
n
∣
θ
j
∣
\quad J(\vec{\theta})=\frac{1}{2} \sum_{i=1}^{m}\left(h_{\vec{\theta}}\left(x^{(i)}\right)-y^{(i)}\right)^{2}+\lambda \sum_{j=1}^{n}\left|\theta_{j}\right|
J(θ
)=21∑i=1m(hθ
(x(i))−y(i))2+λ∑j=1n∣θj∣
□
\square
□ Elastic Net
J
(
θ
⃗
)
=
1
2
∑
i
=
1
m
(
h
θ
ˉ
(
x
(
i
)
)
−
y
(
i
)
)
2
+
λ
(
ρ
⋅
∑
j
=
1
n
∣
θ
j
∣
+
(
1
−
ρ
)
⋅
∑
j
=
1
n
θ
j
2
)
J(\vec{\theta})=\frac{1}{2} \sum_{i=1}^{m}\left(h_{\bar{\theta}}\left(x^{(i)}\right)-y^{(i)}\right)^{2}+\lambda\left(\rho \cdot \sum_{j=1}^{n}\left|\theta_{j}\right|+(1-\rho) \cdot \sum_{j=1}^{n} \theta_{j}^{2}\right)
J(θ
)=21∑i=1m(hθˉ(x(i))−y(i))2+λ(ρ⋅∑j=1n∣θj∣+(1−ρ)⋅∑j=1nθj2)
Moore-Penrose广义逆矩阵(伪逆)
口 若A为非奇异矩阵,则线性方程组Ax=b的解 为
x
=
(
A
T
A
)
−
1
A
T
⋅
b
x=\left(A^{T} A\right)^{-1} A^{T} \cdot b
x=(ATA)−1AT⋅b, 从方程解的直观意义上, 可以定义:
A
+
=
(
A
T
A
)
−
1
A
T
A^{+}=\left(A^{T} A\right)^{-1} A^{T}
A+=(ATA)−1AT
口 若A为可逆方阵,
A
+
=
(
A
T
A
)
−
1
A
T
A^{+}=\left(A^{T} A\right)^{-1} A^{T}
A+=(ATA)−1AT 即为
A
−
1
A^{-1}
A−1
(
A
T
A
)
−
1
A
T
=
A
−
1
(
A
T
)
−
1
A
T
=
A
−
1
\left(A^{T} A\right)^{-1} A^{T}=A^{-1}\left(A^{T}\right)^{-1} A^{T}=A^{-1}
(ATA)−1AT=A−1(AT)−1AT=A−1
□
\square
□ 当A为矩阵(非方阵)时, 称
A
+
A^+
A+称为
A
A
A的广义逆 (违逆)。
SVD计算矩阵的广义逆
□
\square
□ 对于m
×
n
\times \mathrm{n}
×n 的 矩阵A, 若它的SVD分解为:
A
=
U
⋅
Σ
⋅
V
T
A=U \cdot \Sigma \cdot V^{T}
A=U⋅Σ⋅VT
□
\square
□ 则
A
\mathrm{A}
A 的广义逆为
:
A
+
=
V
⋅
Σ
−
1
⋅
U
T
: A^{+}=V \cdot \Sigma^{-1} \cdot U^{T}
:A+=V⋅Σ−1⋅UT
可以验证,若A是n
×
n
\times \mathrm{n}
×n 的可逆阵, 则
A
⋅
A
+
=
I
A \cdot A^{+}=I
A⋅A+=I
若A是不可逆阵或
m
≠
n
m\neq n
m=n, 则
A
+
=
(
A
T
⋅
A
)
−
1
A
A^{+}=\left(A^{T} \cdot A\right)^{-1} A
A+=(AT⋅A)−1A
梯度下降法
J ( θ ) = 1 2 ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 J(\theta)=\frac{1}{2} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right)^{2} J(θ)=21∑i=1m(hθ(x(i))−y(i))2
- 初始化 θ \theta θ(随机初始化)
- 沿着负梯度方向迭代, 更新后的
θ
\theta
θ 使
J
(
θ
)
J(\theta)
J(θ)盒小
θ = θ − α ⋅ ∂ J ( θ ) ∂ θ \theta=\theta-\alpha \cdot \frac{\partial J(\theta)}{\partial \theta} θ=θ−α⋅∂θ∂J(θ)
α : \alpha: α: 学习率、步长
梯度方向
∂ ∂ θ j J ( θ ) = ∂ ∂ θ j 1 2 ( h θ ( x ) − y ) 2 = 2 ⋅ 1 2 ( h θ ( x ) − y ) ⋅ ∂ ∂ θ j ( h θ ( x ) − y ) = ( h θ ( x ) − y ) ⋅ ∂ ∂ θ j ( ∑ i = 0 n θ i x i − y ) = ( h θ ( x ) − y ) x j \begin{aligned} \frac{\partial}{\partial \theta_{j}} J(\theta) &=\frac{\partial}{\partial \theta_{j}} \frac{1}{2}\left(h_{\theta}(x)-y\right)^{2} \\ &=2 \cdot \frac{1}{2}\left(h_{\theta}(x)-y\right) \cdot \frac{\partial}{\partial \theta_{j}}\left(h_{\theta}(x)-y\right) \\ &=\left(h_{\theta}(x)-y\right) \cdot \frac{\partial}{\partial \theta_{j}}\left(\sum_{i=0}^{n} \theta_{i} x_{i}-y\right) \\ &=\left(h_{\theta}(x)-y\right) x_{j} \end{aligned} ∂θj∂J(θ)=∂θj∂21(hθ(x)−y)2=2⋅21(hθ(x)−y)⋅∂θj∂(hθ(x)−y)=(hθ(x)−y)⋅∂θj∂(i=0∑nθixi−y)=(hθ(x)−y)xj
-
批量梯度下降法
Repeat until convergence {
θ j : = θ j + α ∑ i = 1 m ( y ( i ) − h θ ( x ( i ) ) ) x j ( i ) \theta_{j}:=\theta_{j}+\alpha \sum_{i=1}^{m}\left(y^{(i)}-h_{\theta}\left(x^{(i)}\right)\right) x_{j}^{(i)} θj:=θj+α∑i=1m(y(i)−hθ(x(i)))xj(i)
} -
随机梯度下降法
Loop
for i = 1 \mathrm{i}=1 i=1 to m , { \mathrm{m},\{ m,{
θ j : = θ j + α ( y ( i ) − h θ ( x ( i ) ) ) x \theta_{j}:=\theta_{j}+\alpha\left(y^{(i)}-h_{\theta}\left(x^{(i)}\right)\right) x θj:=θj+α(y(i)−hθ(x(i)))x
} -
mini-batch
Repeat until convergence{
θ j : = θ j + α ∑ i = 1 m ( y ( i ) − h θ ( x ( i ) ) ) x j ( \theta_{j}:=\theta_{j}+\alpha \sum_{i=1}^{m}\left(y^{(i)}-h_{\theta}\left(x^{(i)}\right)\right) x_{j}^{(} θj:=θj+αi=1∑m(y(i)−hθ(x(i)))xj(
}Loop{
for i = 1 \mathrm{i}=1 i=1 to m , { \mathrm{m},\{ m,{
θ j : = θ j + α ( y ( i ) − h θ ( x ( i ) ) ) x j ( i ) \theta_{j}:=\theta_{j}+\alpha\left(y^{(i)}-h_{\theta}\left(x^{(i)}\right)\right) x_{j}^{(i)} θj:=θj+α(y(i)−hθ(x(i)))xj(i)
}
}
Logistic回归
h
θ
(
x
)
=
g
(
θ
T
x
)
=
1
1
+
e
−
θ
T
x
h_{\theta}(x)=g\left(\theta^{T} x\right)=\frac{1}{1+e^{-\theta^{T} x}} \quad
hθ(x)=g(θTx)=1+e−θTx1
g
′
(
x
)
=
(
1
1
+
e
−
x
)
′
=
e
−
x
(
1
+
e
−
x
)
2
g^{\prime}(x)=\left(\frac{1}{1+e^{-x}}\right)^{\prime}=\frac{e^{-x}}{\left(1+e^{-x}\right)^{2}}
g′(x)=(1+e−x1)′=(1+e−x)2e−x
=
1
1
+
e
−
x
⋅
e
−
x
1
+
e
−
x
=
1
1
+
e
−
x
⋅
(
1
−
1
1
+
e
−
x
)
=\frac{1}{1+e^{-x}} \cdot \frac{e^{-x}}{1+e^{-x}}=\frac{1}{1+e^{-x}} \cdot\left(1-\frac{1}{1+e^{-x}}\right)
=1+e−x1⋅1+e−xe−x=1+e−x1⋅(1−1+e−x1)
=
g
(
x
)
⋅
(
1
−
g
(
x
)
)
=g(x) \cdot(1-g(x))
=g(x)⋅(1−g(x))
Logistic回归参数估计
□
\square
□ 假定:
P
(
y
=
1
∣
x
;
θ
)
=
h
θ
(
x
)
P
(
y
=
0
∣
x
;
θ
)
=
1
−
h
θ
(
x
)
\begin{aligned} P(y=1 \mid x ; \theta) &=h_{\theta}(x) \\ P(y=0 \mid x ; \theta) &=1-h_{\theta}(x) \end{aligned}
P(y=1∣x;θ)P(y=0∣x;θ)=hθ(x)=1−hθ(x)
p
(
y
∣
x
;
θ
)
=
(
h
θ
(
x
)
)
y
(
1
−
h
θ
(
x
)
)
1
−
y
L
(
θ
)
=
p
(
y
⃗
∣
X
;
θ
)
=
∏
i
=
1
m
p
(
y
(
i
)
∣
x
(
i
)
;
θ
)
=
∏
i
=
1
m
(
h
θ
(
x
(
i
)
)
)
y
(
i
)
(
1
−
h
θ
(
x
(
i
)
)
)
1
−
y
(
i
)
\begin{aligned} p(y \mid x ; \theta) &=\left(h_{\theta}(x)\right)^{y}\left(1-h_{\theta}(x)\right)^{1-y} \\ L(\theta) &=p(\vec{y} \mid X ; \theta) \\ &=\prod_{i=1}^{m} p\left(y^{(i)} \mid x^{(i)} ; \theta\right) \\ &=\prod_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)\right)^{y^{(i)}}\left(1-h_{\theta}\left(x^{(i)}\right)\right)^{1-y^{(i)}} \end{aligned}
p(y∣x;θ)L(θ)=(hθ(x))y(1−hθ(x))1−y=p(y
∣X;θ)=i=1∏mp(y(i)∣x(i);θ)=i=1∏m(hθ(x(i)))y(i)(1−hθ(x(i)))1−y(i)
l
(
θ
)
=
log
L
(
θ
)
=
∑
m
y
(
i
)
log
h
(
x
(
i
)
)
+
(
1
−
y
(
i
)
)
log
(
1
−
h
(
x
(
i
)
)
)
l(\theta)=\log L(\theta)=\sum^{m} y^{(i)} \log h\left(x^{(i)}\right)+\left(1-y^{(i)}\right) \log \left(1-h\left(x^{(i)}\right)\right)
l(θ)=logL(θ)=∑my(i)logh(x(i))+(1−y(i))log(1−h(x(i)))
∂
l
(
θ
)
∂
θ
j
=
∑
i
=
1
m
(
y
(
i
)
h
(
x
(
i
)
)
−
1
−
y
(
i
)
1
−
h
(
x
(
i
)
)
)
⋅
∂
h
(
x
(
i
)
)
∂
θ
j
\frac{\partial l(\theta)}{\partial \theta_{j}}=\sum_{i=1}^{m}\left(\frac{y^{(i)}}{h\left(x^{(i)}\right)}-\frac{1-y^{(i)}}{1-h\left(x^{(i)}\right)}\right) \cdot \frac{\partial h\left(x^{(i)}\right)}{\partial \theta_{j}}
∂θj∂l(θ)=∑i=1m(h(x(i))y(i)−1−h(x(i))1−y(i))⋅∂θj∂h(x(i))
=
∑
i
=
1
m
(
y
(
i
)
g
(
θ
T
x
(
i
)
)
−
1
−
y
(
i
)
1
−
g
(
θ
T
x
(
i
)
)
)
⋅
∂
g
(
θ
T
x
(
i
)
)
∂
θ
j
=\sum_{i=1}^{m}\left(\frac{y^{(i)}}{g\left(\theta^{T} x^{(i)}\right)}-\frac{1-y^{(i)}}{1-g\left(\theta^{T} x^{(i)}\right)}\right) \cdot \frac{\partial g\left(\theta^{T} x^{(i)}\right)}{\partial \theta_{j}}
=∑i=1m(g(θTx(i))y(i)−1−g(θTx(i))1−y(i))⋅∂θj∂g(θTx(i))
=
∑
i
=
1
m
(
y
(
i
)
g
(
θ
T
x
(
i
)
)
−
1
−
y
(
i
)
1
−
g
(
θ
T
x
(
i
)
)
)
⋅
g
(
θ
T
x
(
i
)
)
⋅
(
1
−
g
(
θ
T
x
(
i
)
)
)
⋅
∂
θ
T
x
(
i
)
∂
θ
j
=\sum_{i=1}^{m}\left(\frac{y^{(i)}}{g\left(\theta^{T} x^{(i)}\right)}-\frac{1-y^{(i)}}{1-g\left(\theta^{T} x^{(i)}\right)}\right) \cdot g\left(\theta^{T} x^{(i)}\right) \cdot\left(1-g\left(\theta^{T} x^{(i)}\right)\right) \cdot \frac{\partial \theta^{T} x^{(i)}}{\partial \theta_{j}}
=∑i=1m(g(θTx(i))y(i)−1−g(θTx(i))1−y(i))⋅g(θTx(i))⋅(1−g(θTx(i)))⋅∂θj∂θTx(i)
=
∑
i
=
1
m
(
y
(
i
)
(
1
−
g
(
θ
T
x
(
i
)
)
)
−
(
1
−
y
(
i
)
)
g
(
θ
T
x
(
i
)
)
)
⋅
x
j
(
i
)
=\sum_{i=1}^{m}\left(y^{(i)}\left(1-g\left(\theta^{T} x^{(i)}\right)\right)-\left(1-y^{(i)}\right) g\left(\theta^{T} x^{(i)}\right)\right) \cdot x_{j}^{(i)}
=∑i=1m(y(i)(1−g(θTx(i)))−(1−y(i))g(θTx(i)))⋅xj(i)
=
∑
i
=
1
m
(
y
(
i
)
−
g
(
θ
T
x
(
i
)
)
)
⋅
x
j
(
i
)
=\sum_{i=1}^{m}\left(y^{(i)}-g\left(\theta^{T} x^{(i)}\right)\right) \cdot x_{j}^{(i)}
=∑i=1m(y(i)−g(θTx(i)))⋅xj(i)
Logistic回归参数的学习规则:
θ j : = θ j + α ( y ( i ) − h θ ( x ( i ) ) ) x j ( i ) \theta_{j}:=\theta_{j}+\alpha\left(y^{(i)}-h_{\theta}\left(x^{(i)}\right)\right) x_{j}^{(i)} θj:=θj+α(y(i)−hθ(x(i)))xj(i)
比较上面的结果和线性回归的结论的差别:
它们具有相同的形式!
Repeat until convergence{
θ
j
:
=
θ
j
+
α
∑
i
=
1
m
(
y
(
i
)
−
h
θ
(
x
(
i
)
)
)
x
j
(
i
)
\theta_{j}:=\theta_{j}+\alpha \sum_{i=1}^{m}\left(y^{(i)}-h_{\theta}\left(x^{(i)}\right)\right) x_{j}^{(i)}
θj:=θj+α∑i=1m(y(i)−hθ(x(i)))xj(i)
}
Loop { for
i
=
1
\mathrm{i}=1
i=1 to
m
,
{
\mathrm{m}, \{
m,{
θ
j
:
=
θ
j
+
α
(
y
(
i
)
−
h
θ
(
x
(
i
)
)
)
x
j
(
i
)
\quad \theta_{j}:=\theta_{j}+\alpha\left(y^{(i)}-h_{\theta}\left(x^{(i)}\right)\right) x_{j}^{(i)}
θj:=θj+α(y(i)−hθ(x(i)))xj(i)
}
}
原因:线性回归假设样本服从高斯分布,逻辑斯蒂回归假设样本服从二项分布,二项分布和高斯分布同属于指数分布族