目录
一 交叉熵原理
1 信息量
信息量的大小与信息发生的概率成反比。
公式如下:
I
(
x
)
=
−
l
o
g
(
P
(
x
)
)
I(x)=-log (P(x))
I(x)=−log(P(x))
其中,
I
(
x
)
I(x)
I(x)为信息量,
P
(
x
)
P(x)
P(x)为某一事件发生的概率
2 信息熵(熵)
信息熵用来表示所有信息量的期望。
公式如下:
H
(
X
)
=
−
∑
i
=
1
n
P
(
x
i
)
log
(
P
(
x
i
)
)
H(\mathrm{X})=-\sum_{i=1}^{n} P\left(x_{i}\right) \log \left(P\left(x_{i}\right)\right)
H(X)=−i=1∑nP(xi)log(P(xi))
其中
X
X
X为离散变量
(
X
=
x
1
,
x
2
,
…
,
x
n
)
(X=x 1, x 2, \ldots, x n)
(X=x1,x2,…,xn)
3 相对熵(KL散度)
使用KL散度来衡量对于同一随机变量的两个单独概率分布之间的差异。
公式如下:
D
K
L
(
p
∥
q
)
=
∑
i
=
1
n
p
(
x
i
)
log
(
p
(
x
i
)
q
(
x
i
)
)
D_{K L}(p \| q)=\sum_{i=1}^{n} p\left(x_{i}\right) \log \left(\frac{p\left(x_{i}\right)}{q\left(x_{i}\right)}\right)
DKL(p∥q)=i=1∑np(xi)log(q(xi)p(xi))
P
(
x
)
P(x)
P(x)表示样本的真实分布,
Q
(
x
)
Q(x)
Q(x)表示模型所预测的分布。
KL散度越小,表示
P
(
x
)
P(x)
P(x)和
Q
(
x
)
Q(x)
Q(x)的分布更接近,反复训练
Q
(
x
)
Q(x)
Q(x)使其分布逼近
P
(
x
)
P(x)
P(x)。
4 交叉熵
交叉熵=相对熵-信息熵
H
(
p
,
q
)
=
[
−
∑
i
=
1
n
p
(
x
i
)
log
(
q
(
x
i
)
)
]
H(p, q)=\left[-\sum_{i=1}^{n} p\left(x_{i}\right) \log \left(q\left(x_{i}\right)\right)\right]
H(p,q)=[−i=1∑np(xi)log(q(xi))]
注:
D
K
L
(
p
∥
q
)
=
∑
i
=
1
n
p
(
x
i
)
log
(
p
(
x
i
)
q
(
x
i
)
)
=
∑
i
=
1
n
p
(
x
i
)
log
(
p
(
x
i
)
)
−
∑
i
=
1
n
p
(
x
i
)
log
(
q
(
x
i
)
)
=
H
(
p
(
x
)
)
+
[
−
∑
i
=
1
n
p
(
x
i
)
log
(
q
(
x
i
)
)
]
\begin{gathered} D_{K L}(p \| q)=\sum_{i=1}^{n} p\left(x_{i}\right) \log \left(\frac{p\left(x_{i}\right)}{q\left(x_{i}\right)}\right) \\ =\sum_{i=1}^{n} p\left(x_{i}\right) \log \left(p\left(x_{i}\right)\right)-\sum_{i=1}^{n} p\left(x_{i}\right) \log \left(q\left(x_{i}\right)\right) \\ =H(p(x))+\left[-\sum_{i=1}^{n} p\left(x_{i}\right) \log \left(q\left(x_{i}\right)\right)\right] \end{gathered}
DKL(p∥q)=i=1∑np(xi)log(q(xi)p(xi))=i=1∑np(xi)log(p(xi))−i=1∑np(xi)log(q(xi))=H(p(x))+[−i=1∑np(xi)log(q(xi))]
训练网络时输入数据与标签已经确定,即
P
(
x
)
P(x)
P(x)确定,信息熵为常量。KL值越小,预测结果越好,需最小化KL散度,即用交叉熵损失函数计算。
5 小结
交叉熵源于信息论,主要用于度量两个概率分布间的差异性。
在线性回归问题中,常使用MSE作为损失函数;在分类问题中常使用交叉熵作为损失函数,在输出层使用softmax将输出的结果进行处理,使其多个分类的预测值和为1,再通过交叉熵来计算损失。
二 推导
1 Logistic交叉熵损失函数
公式:
J
(
θ
)
=
−
1
m
∑
i
=
1
m
y
(
i
)
log
(
h
θ
(
x
(
i
)
)
)
+
(
1
−
y
(
i
)
)
log
(
1
−
h
θ
(
x
(
i
)
)
)
J(\theta)=-\frac{1}{m} \sum_{i=1}^{m} y^{(i)} \log \left(h_{\theta}\left(x^{(i)}\right)\right)+\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right)
J(θ)=−m1i=1∑my(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i)))
导数:
∂
∂
θ
j
J
(
θ
)
=
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
j
(
i
)
\frac{\partial}{\partial \theta_{j}} J(\theta)=\frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)}
∂θj∂J(θ)=m1i=1∑m(hθ(x(i))−y(i))xj(i)
推导
对于logistic回归,m组样本,输入样本
x
(
i
)
=
(
1
,
x
1
(
i
)
,
x
2
(
i
)
,
…
,
x
p
(
i
)
)
T
x^{(i)}=\left(1, x_{1}^{(i)}, x_{2}^{(i)}, \ldots, x_{p}^{(i)}\right)^{T}
x(i)=(1,x1(i),x2(i),…,xp(i))T,为
p
+
1
p+1
p+1维向量(考虑bias);
y
(
i
)
y^{(i)}
y(i)表示类别,此处取0或1;模型的参数为
θ
=
(
θ
0
,
θ
1
,
…
,
θ
p
)
T
\theta=\left(\theta_{0}, \theta_{1, \ldots,} \theta_{p}\right)^{T}
θ=(θ0,θ1,…,θp)T
θ
T
x
(
i
)
:
=
θ
0
+
θ
1
x
1
(
i
)
+
⋯
+
θ
p
x
p
(
i
)
.
\theta^{T} x^{(i)}:=\theta_{0}+\theta_{1} x_{1}^{(i)}+\cdots+\theta_{p} x_{p}^{(i)} .
θTx(i):=θ0+θ1x1(i)+⋯+θpxp(i).
假设函数定义为:
h
θ
(
x
(
i
)
)
=
1
1
+
e
−
θ
T
x
(
i
)
h_{\theta}\left(x^{(i)}\right)=\frac{1}{1+e^{{-\theta ^T}x^{(i)}}}
hθ(x(i))=1+e−θTx(i)1
P
(
y
^
(
i
)
=
1
∣
x
(
i
)
;
θ
)
=
h
θ
(
x
(
i
)
)
P
(
y
^
(
i
)
=
0
∣
x
(
i
)
;
θ
)
=
1
−
h
θ
(
x
(
i
)
)
log
P
(
y
^
(
i
)
=
1
∣
x
(
i
)
;
θ
)
=
log
h
θ
(
x
(
i
)
)
=
log
1
1
+
e
−
θ
T
x
(
i
)
log
P
(
y
^
(
i
)
=
0
∣
x
(
i
)
;
θ
)
=
log
(
1
−
h
θ
(
x
(
i
)
)
)
=
log
e
−
θ
T
x
(
i
)
1
+
e
−
θ
T
x
(
i
)
\begin{gathered} P\left(\hat{y}^{(i)}=1 \mid x^{(i)} ; \theta\right)=h_{\theta}\left(x^{(i)}\right) \\ P\left(\hat{y}^{(i)}=0 \mid x^{(i)} ; \theta\right)=1-h_{\theta}\left(x^{(i)}\right) \\ \log P\left(\hat{y}^{(i)}=1 \mid x^{(i)} ; \theta\right)=\log h_{\theta}\left(x^{(i)}\right)=\log \frac{1}{1+e^{{-\theta ^{T}} x^{(i)}}} \\ \log P\left(\hat{y}^{(i)}=0 \mid x^{(i)} ; \theta\right)=\log \left(1-h_{\theta}\left(x^{(i)}\right)\right)=\log \frac{e^{-\theta^{T} x^{(i)}}}{1+e^{-\theta^{T} x^{(i)}}} \end{gathered}
P(y^(i)=1∣x(i);θ)=hθ(x(i))P(y^(i)=0∣x(i);θ)=1−hθ(x(i))logP(y^(i)=1∣x(i);θ)=loghθ(x(i))=log1+e−θTx(i)1logP(y^(i)=0∣x(i);θ)=log(1−hθ(x(i)))=log1+e−θTx(i)e−θTx(i)
对于第
i
i
i组样本,假设函数表征正确的组合对数概率为:
I
{
y
(
i
)
=
1
}
log
P
(
y
^
(
i
)
=
1
∣
x
(
i
)
;
θ
)
+
I
{
y
(
i
)
=
0
}
log
P
(
y
^
(
i
)
=
0
∣
x
(
i
)
;
θ
)
=
y
(
i
)
log
P
(
y
^
(
i
)
=
1
∣
x
(
i
)
;
θ
)
+
(
1
−
y
(
i
)
)
log
P
(
y
^
(
i
)
=
0
∣
x
(
i
)
;
θ
)
=
y
(
i
)
log
(
h
θ
(
x
(
i
)
)
)
+
(
1
−
y
(
i
)
)
log
(
1
−
h
θ
(
x
(
i
)
)
)
\begin{gathered} I\left\{y^{(i)}=1\right\} \log P\left(\hat{y}^{(i)}=1 \mid x^{(i)} ; \theta\right)+I\left\{y^{(i)}=0\right\} \log P\left(\hat{y}^{(i)}=0 \mid x^{(i)} ; \theta\right) \\ =y^{(i)} \log P\left(\hat{y}^{(i)}=1 \mid x^{(i)} ; \theta\right)+\left(1-y^{(i)}\right) \log P\left(\hat{y}^{(i)}=0 \mid x^{(i)} ; \theta\right) \\ =y^{(i)} \log \left(h_{\theta}\left(x^{(i)}\right)\right)+\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right) \end{gathered}
I{y(i)=1}logP(y^(i)=1∣x(i);θ)+I{y(i)=0}logP(y^(i)=0∣x(i);θ)=y(i)logP(y^(i)=1∣x(i);θ)+(1−y(i))logP(y^(i)=0∣x(i);θ)=y(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i)))
对于
m
m
m组样本可得损失函数:
J
(
θ
)
=
−
1
m
∑
i
=
1
m
y
(
i
)
log
(
h
θ
(
x
(
i
)
)
)
+
(
1
−
y
(
i
)
)
log
(
1
−
h
θ
(
x
(
i
)
)
)
J(\theta)=-\frac{1}{m} \sum_{i=1}^{m} y^{(i)} \log \left(h_{\theta}\left(x^{(i)}\right)\right)+\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right)
J(θ)=−m1i=1∑my(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i)))
J
J
J取负号的原因:表征正确的概率值越大,模型对数据的表达能力越好;但在衡量模型优劣时表现误差的损失函数且越小越好。两相矛盾,所以令损失函数对表征正确的组合对数概率取反。
求导
第一步:
J
(
θ
)
=
−
1
m
∑
i
=
1
m
y
(
i
)
log
(
h
θ
(
x
(
i
)
)
)
+
(
1
−
y
(
i
)
)
log
(
1
−
h
θ
(
x
(
i
)
)
)
=
−
1
m
∑
i
=
1
m
[
−
y
(
i
)
(
log
(
1
+
e
−
θ
T
x
(
i
)
)
)
+
(
1
−
y
(
i
)
)
(
−
θ
T
x
(
i
)
−
log
(
1
+
e
−
θ
T
x
(
i
)
)
)
]
=
−
1
m
∑
i
=
1
m
[
y
(
i
)
θ
T
x
(
i
)
−
θ
T
x
(
i
)
−
log
(
1
+
e
−
θ
T
x
(
i
)
)
]
=
−
1
m
∑
i
=
1
m
[
y
(
i
)
θ
T
x
(
i
)
−
log
e
θ
T
x
(
i
)
−
log
(
1
+
e
−
θ
T
x
(
i
)
)
]
=
−
1
m
∑
i
=
1
m
[
y
(
i
)
θ
T
x
(
i
)
−
(
log
e
θ
T
x
(
i
)
+
log
(
1
+
e
−
θ
T
x
(
i
)
)
)
]
=
−
1
m
∑
i
=
1
m
[
y
(
i
)
θ
T
x
(
i
)
−
log
(
e
θ
T
x
(
i
)
+
1
)
]
\begin{gathered} J(\theta)=-\frac{1}{m} \sum_{i=1}^{m} y^{(i)} \log \left(h_{\theta}\left(x^{(i)}\right)\right)+\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right)\\ =-\frac{1}{m} \sum_{i=1}^{m}\left[-y^{(i)}\left(\log \left(1+e^{-\theta^{T} x^{(i)}}\right)\right)+\left(1-y^{(i)}\right)\left(-\theta^{T} x^{(i)}-\log \left(1+e^{-\theta^{T} x^{(i)}}\right)\right)\right] \\ =-\frac{1}{m} \sum_{i=1}^{m}\left[y^{(i)} \theta^{T} x^{(i)}-\theta^{T} x^{(i)}-\log \left(1+e^{-\theta^{T} x^{(i)}}\right)\right] \\ =-\frac{1}{m} \sum_{i=1}^{m}\left[y^{(i)} \theta^{T} x^{(i)}-\log e^{\theta^{T} x^{(i)}}-\log \left(1+e^{-\theta^{T} x^{(i)}}\right)\right]_{} \\ =-\frac{1}{m} \sum_{i=1}^{m}\left[y^{(i)} \theta^{T} x^{(i)}-\left(\log e^{\theta^{T} x^{(i)}}+\log \left(1+e^{-\theta^{T} x^{(i)}}\right)\right)\right]_{} \\ =-\frac{1}{m} \sum_{i=1}^{m}\left[y^{(i)} \theta^{T} x^{(i)}-\log \left(e^{\theta^{T} x^{(i)}}+1\right)\right] \end{gathered}
J(θ)=−m1i=1∑my(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i)))=−m1i=1∑m[−y(i)(log(1+e−θTx(i)))+(1−y(i))(−θTx(i)−log(1+e−θTx(i)))]=−m1i=1∑m[y(i)θTx(i)−θTx(i)−log(1+e−θTx(i))]=−m1i=1∑m[y(i)θTx(i)−logeθTx(i)−log(1+e−θTx(i))]=−m1i=1∑m[y(i)θTx(i)−(logeθTx(i)+log(1+e−θTx(i)))]=−m1i=1∑m[y(i)θTx(i)−log(eθTx(i)+1)]
第二步:
∂
∂
θ
j
J
(
θ
)
=
∂
∂
θ
j
(
1
m
∑
i
=
1
m
[
log
(
1
+
e
θ
T
x
(
i
)
)
−
y
(
i
)
θ
T
x
(
i
)
]
)
=
1
m
∑
i
=
1
m
(
x
j
(
i
)
e
θ
T
x
(
i
)
1
+
e
θ
T
x
(
i
)
−
y
(
i
)
x
j
(
i
)
)
=
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
j
(
i
)
\begin{gathered} \frac{\partial}{\partial \theta_{j}} J(\theta)=\frac{\partial}{\partial \theta_{j}}\left(\frac{1}{m} \sum_{i=1}^{m}\left[\log \left(1+e^{\theta^{T} x^{(i)}}\right)-y^{(i)} \theta^{T} x^{(i)}\right]\right) \\ =\frac{1}{m} \sum_{i=1}^{m}\left(\frac{x_{j}^{(i)} e^{\theta^{T} x^{(i)}}}{1+e^{\theta^{T} x^{(i)}}}-y^{(i)} x_{j}^{(i)}\right) \\ =\frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)} \end{gathered}
∂θj∂J(θ)=∂θj∂(m1i=1∑m[log(1+eθTx(i))−y(i)θTx(i)])=m1i=1∑m(1+eθTx(i)xj(i)eθTx(i)−y(i)xj(i))=m1i=1∑m(hθ(x(i))−y(i))xj(i)
2 Softmax交叉熵损失函数
公式:
C
=
−
∑
i
y
i
ln
a
i
C=-\sum_{i} y_{i} \ln a_{i}
C=−i∑yilnai
a
i
=
e
z
i
∑
k
e
z
k
,
z
i
=
∑
j
w
i
j
x
i
j
+
b
a_{i}=\frac{e^{z _{i}}}{\sum_{k} e^{z _{k}}},z_{i}=\sum_{j} w_{i j} x_{i j}+b
ai=∑kezkezi,zi=∑jwijxij+b
其中,
y
i
y_{i}
yi表示真实的分类结果,
z
i
z_{i}
zi为神经元的输出
w
i
j
w_{i j}
wij为第
i
i
i个神经元的第
j
j
j个权重,
b
b
b是偏移值,
z
i
z_{i}
zi表示该网络的第
i
i
i个输出,
a
i
a_{i}
ai为给第
i
i
i个输出加softmax函数:
导数:
∂
C
∂
z
i
=
a
i
−
y
i
\frac{\partial C}{\partial z_{i}}=a_{i}-y_{i}
∂zi∂C=ai−yi
推导:
∂
C
∂
z
i
=
∑
j
(
∂
C
j
∂
a
j
∂
a
j
∂
z
i
)
\frac{\partial C}{\partial z_{i}}=\sum_{j}\left(\frac{\partial C_{j}}{\partial a_{j}} \frac{\partial a_{j}}{\partial z_{i}}\right)
∂zi∂C=j∑(∂aj∂Cj∂zi∂aj)
∂
C
j
∂
a
j
=
∂
(
−
y
j
ln
a
j
)
∂
a
j
=
−
y
j
1
a
j
\frac{\partial C_{j}}{\partial a_{j}}=\frac{\partial\left(-y_{j} \ln a_{j}\right)}{\partial a_{j}}=-y_{j} \frac{1}{a_{j}}
∂aj∂Cj=∂aj∂(−yjlnaj)=−yjaj1
对于
∂
a
j
∂
z
i
\frac{\partial a_{j}}{\partial z_{i}}
∂zi∂aj有如下两种情况:
(1)
i
=
j
i=j
i=j
∂
a
i
∂
z
i
=
∂
(
e
z
i
∑
k
e
z
k
)
∂
z
i
=
∑
k
e
z
k
e
z
i
−
(
e
z
i
)
2
(
∑
k
e
z
k
)
2
=
(
e
z
i
∑
k
e
z
k
)
(
1
−
e
z
i
∑
k
e
z
k
)
=
a
i
(
1
−
a
i
)
\frac{\partial a_{i}}{\partial z_{i}}=\frac{\partial\left(\frac{e^{z _{i}}}{\sum_{k} e^{z _{k}}}\right)}{\partial z_{i}}=\frac{\sum_{k} e^{z _{k}} e^{z _{i}}-\left(e^{z _{i}}\right)^{2}}{\left(\sum_{k} e^{z _{k}}\right)^{2}}\\ =\left(\frac{e^{z_{i}}}{\sum_{k} e^{z k}}\right)\left(1-\frac{e^{z_{i}}}{\sum_{k} e^{z k}}\right)=a_{i}\left(1-a_{i}\right)
∂zi∂ai=∂zi∂(∑kezkezi)=(∑kezk)2∑kezkezi−(ezi)2=(∑kezkezi)(1−∑kezkezi)=ai(1−ai)
(2)
i
≠
j
i \neq j
i=j
∂
a
j
∂
z
i
=
∂
(
e
z
j
∑
k
e
z
k
)
∂
z
i
=
−
e
z
j
(
1
∑
k
e
z
k
)
2
e
z
i
=
−
a
i
a
j
\frac{\partial a_{j}}{\partial z_{i}}=\frac{\partial\left(\frac{e^{z _{j}}}{\sum k e^{z_{k}}}\right)}{\partial z_{i}}=-e^{z_{ j}}\left(\frac{1}{\sum_{k} e^{z k}}\right)^{2} e^{z_ {i}}=-a_{i} a_{j}
∂zi∂aj=∂zi∂(∑kezkezj)=−ezj(∑kezk1)2ezi=−aiaj
综上:
∂
C
∂
z
i
=
∑
j
(
∂
C
j
∂
a
j
∂
a
j
∂
z
i
)
=
∑
j
≠
i
(
∂
C
j
∂
a
j
∂
a
j
∂
z
i
)
+
∑
i
=
j
(
∂
C
j
∂
a
j
∂
a
j
∂
z
i
)
=
∑
j
≠
i
−
y
j
1
a
j
(
−
a
i
a
j
)
+
(
−
y
i
1
a
i
)
(
a
i
(
1
−
a
i
)
)
=
∑
j
≠
i
a
i
y
j
+
(
−
y
i
(
1
−
a
i
)
)
=
∑
j
≠
i
a
i
y
j
+
a
i
y
i
−
y
i
=
a
i
∑
j
y
j
−
y
i
\begin{aligned} &\frac{\partial C}{\partial z_{i}}=\sum_{j}\left(\frac{\partial C_{j}}{\partial a_{j}} \frac{\partial a_{j}}{\partial z_{i}}\right)=\sum_{j \neq i}\left(\frac{\partial C_{j}}{\partial a_{j}} \frac{\partial a_{j}}{\partial z_{i}}\right)+\sum_{i=j}\left(\frac{\partial C_{j}}{\partial a_{j}} \frac{\partial a_{j}}{\partial z_{i}}\right) \\ &=\sum_{j \neq i}-y_{j} \frac{1}{a_{j}}\left(-a_{i} a_{j}\right)+\left(-y_{i} \frac{1}{a_{i}}\right)\left(a_{i}\left(1-a_{i}\right)\right) \\ &=\sum_{j \neq i} a_{i} y_{j}+\left(-y_{i}\left(1-a_{i}\right)\right) \\ &=\sum_{j \neq i} a_{i} y_{j}+a_{i} y_{i}-y_{i} \\ &=a_{i} \sum_{j} y_{j}-y_{i} \end{aligned}
∂zi∂C=j∑(∂aj∂Cj∂zi∂aj)=j=i∑(∂aj∂Cj∂zi∂aj)+i=j∑(∂aj∂Cj∂zi∂aj)=j=i∑−yjaj1(−aiaj)+(−yiai1)(ai(1−ai))=j=i∑aiyj+(−yi(1−ai))=j=i∑aiyj+aiyi−yi=aij∑yj−yi
针对分类问题,
y
i
yi
yi最终只会有一个类别是1,其他类别都是0
所以
∂
C
∂
z
i
=
a
i
−
y
i
\frac{\partial C}{\partial z_{i}}=a_{i}-y_{i}
∂zi∂C=ai−yi
附录 求导公式和法则
基本初等函数求导公式
(1)
(
C
)
′
=
0
\quad(C)^{\prime}=0
(C)′=0
(2)
(
x
μ
)
′
=
μ
x
μ
−
1
\quad\left(x^{\mu}\right)^{\prime}=\mu x^{\mu-1}
(xμ)′=μxμ−1
(3)
(
sin
x
)
′
=
cos
x
(\sin x)^{\prime}=\cos x
(sinx)′=cosx
(4)
(
cos
x
)
′
=
−
sin
x
(\cos x)^{\prime}=-\sin x
(cosx)′=−sinx
(5)
(
tan
x
)
′
=
sec
2
x
(\tan x)^{\prime}=\sec ^{2} x
(tanx)′=sec2x
(6)
(
cot
x
)
′
=
−
csc
2
x
(\cot x)^{\prime}=-\csc ^{2} x
(cotx)′=−csc2x
(7)
(
sec
x
)
′
=
sec
x
tan
x
(\sec x)^{\prime}=\sec x \tan x
(secx)′=secxtanx
(8)
(
csc
x
)
′
=
−
csc
x
cot
x
(\csc x)^{\prime}=-\csc x \cot x
(cscx)′=−cscxcotx
(9)
(
a
x
)
′
=
a
x
ln
a
\left(a^{x}\right)^{\prime}=a^{x} \ln a
(ax)′=axlna
(10)
(
e
x
)
′
=
e
x
\left(\mathrm{e}^{x}\right)^{\prime}=\mathrm{e}^{x}
(ex)′=ex
(11)
(
log
a
x
)
′
=
1
x
ln
a
\left(\log _{a} x\right)^{\prime}=\frac{1}{x \ln a}
(logax)′=xlna1
(12)
(
ln
x
)
′
=
1
x
(\ln x)^{\prime}=\frac{1}{x}
(lnx)′=x1,
(13)
(
arcsin
x
)
′
=
1
1
−
x
2
(\arcsin x)^{\prime}=\frac{1}{\sqrt{1-x^{2}}}
(arcsinx)′=1−x2
1
(14)
(
arccos
x
)
′
=
−
1
1
−
x
2
(\arccos x)^{\prime}=-\frac{1}{\sqrt{1-x^{2}}}
(arccosx)′=−1−x2
1
(15)
(
arctan
x
)
′
=
1
1
+
x
2
(\arctan x)^{\prime}=\frac{1}{1+x^{2}}
(arctanx)′=1+x21
(16)
(
arccot
x
)
′
=
−
1
1
+
x
2
(\operatorname{arccot} x)^{\prime}=-\frac{1}{1+x^{2}}
(arccotx)′=−1+x21
求导法则
设
u
=
u
(
x
)
,
v
=
v
(
x
)
u=u(x), v=v(x)
u=u(x),v=v(x) 都可导, 则
(1)
(
u
±
v
)
′
=
u
′
±
v
′
\quad(u \pm v)^{\prime}=u^{\prime} \pm v^{\prime}
(u±v)′=u′±v′
(2)
(
C
u
)
′
=
C
u
′
(
C
(C u)^{\prime}=C u^{\prime}(C
(Cu)′=Cu′(C 是常数)
(3)
(
u
v
)
′
=
u
′
v
+
u
v
′
\quad(u v)^{\prime}=u^{\prime} v+u v^{\prime}
(uv)′=u′v+uv′
(4)
(
u
v
)
′
=
u
′
v
−
u
v
′
v
2
\left(\frac{u}{v}\right)^{\prime}=\frac{u^{\prime} v-u v^{\prime}}{v^{2}}
(vu)′=v2u′v−uv′
复合函数求导法则
设
y
=
f
(
u
)
y=f(u)
y=f(u), 而
u
=
φ
(
x
)
u=\varphi(x)
u=φ(x) 且
f
(
u
)
f(u)
f(u) 及
φ
(
x
)
\varphi(x)
φ(x) 都可导, 则复合函数
y
=
f
[
φ
(
x
)
]
y=f[\varphi(x)]
y=f[φ(x)] 的导数为
d
y
d
x
=
d
y
d
u
⋅
d
u
d
x
或
y
′
=
f
′
(
u
)
⋅
φ
′
(
x
)
\frac{d y}{d x}=\frac{d y}{d u} \cdot \frac{d u}{d x} \text { 或 } y^{\prime}=f^{\prime}(u) \cdot \varphi^{\prime}(x)
dxdy=dudy⋅dxdu 或 y′=f′(u)⋅φ′(x)