交叉熵损失函数原理和推导

目录

一 交叉熵原理

1 信息量

信息量的大小与信息发生的概率成反比。
公式如下:
I ( x ) = − l o g ( P ( x ) ) I(x)=-log (P(x)) I(x)=−log(P(x))
其中, I ( x ) I(x) I(x)为信息量, P ( x ) P(x) P(x)为某一事件发生的概率

2 信息熵(熵)

信息熵用来表示所有信息量的期望。
公式如下:
H ( X ) = − ∑ i = 1 n P ( x i ) log ⁡ ( P ( x i ) ) H(\mathrm{X})=-\sum_{i=1}^{n} P\left(x_{i}\right) \log \left(P\left(x_{i}\right)\right) H(X)=−i=1∑n​P(xi​)log(P(xi​))
其中 X X X为离散变量 ( X = x 1 , x 2 , … , x n ) (X=x 1, x 2, \ldots, x n) (X=x1,x2,…,xn)

3 相对熵(KL散度)

使用KL散度来衡量对于同一随机变量的两个单独概率分布之间的差异。
公式如下:

D K L ( p ∥ q ) = ∑ i = 1 n p ( x i ) log ⁡ ( p ( x i ) q ( x i ) ) D_{K L}(p \| q)=\sum_{i=1}^{n} p\left(x_{i}\right) \log \left(\frac{p\left(x_{i}\right)}{q\left(x_{i}\right)}\right) DKL​(p∥q)=i=1∑n​p(xi​)log(q(xi​)p(xi​)​)
P ( x ) P(x) P(x)表示样本的真实分布, Q ( x ) Q(x) Q(x)表示模型所预测的分布。
KL散度越小,表示 P ( x ) P(x) P(x)和 Q ( x ) Q(x) Q(x)的分布更接近,反复训练 Q ( x ) Q(x) Q(x)使其分布逼近 P ( x ) P(x) P(x)。

4 交叉熵

交叉熵=相对熵-信息熵
H ( p , q ) = [ − ∑ i = 1 n p ( x i ) log ⁡ ( q ( x i ) ) ] H(p, q)=\left[-\sum_{i=1}^{n} p\left(x_{i}\right) \log \left(q\left(x_{i}\right)\right)\right] H(p,q)=[−i=1∑n​p(xi​)log(q(xi​))]
注:
D K L ( p ∥ q ) = ∑ i = 1 n p ( x i ) log ⁡ ( p ( x i ) q ( x i ) ) = ∑ i = 1 n p ( x i ) log ⁡ ( p ( x i ) ) − ∑ i = 1 n p ( x i ) log ⁡ ( q ( x i ) ) = H ( p ( x ) ) + [ − ∑ i = 1 n p ( x i ) log ⁡ ( q ( x i ) ) ] \begin{gathered} D_{K L}(p \| q)=\sum_{i=1}^{n} p\left(x_{i}\right) \log \left(\frac{p\left(x_{i}\right)}{q\left(x_{i}\right)}\right) \\ =\sum_{i=1}^{n} p\left(x_{i}\right) \log \left(p\left(x_{i}\right)\right)-\sum_{i=1}^{n} p\left(x_{i}\right) \log \left(q\left(x_{i}\right)\right) \\ =H(p(x))+\left[-\sum_{i=1}^{n} p\left(x_{i}\right) \log \left(q\left(x_{i}\right)\right)\right] \end{gathered} DKL​(p∥q)=i=1∑n​p(xi​)log(q(xi​)p(xi​)​)=i=1∑n​p(xi​)log(p(xi​))−i=1∑n​p(xi​)log(q(xi​))=H(p(x))+[−i=1∑n​p(xi​)log(q(xi​))]​
训练网络时输入数据与标签已经确定,即 P ( x ) P(x) P(x)确定,信息熵为常量。KL值越小,预测结果越好,需最小化KL散度,即用交叉熵损失函数计算。

5 小结

交叉熵源于信息论,主要用于度量两个概率分布间的差异性。
在线性回归问题中,常使用MSE作为损失函数;在分类问题中常使用交叉熵作为损失函数,在输出层使用softmax将输出的结果进行处理,使其多个分类的预测值和为1,再通过交叉熵来计算损失。

二 推导

1 Logistic交叉熵损失函数

公式
J ( θ ) = − 1 m ∑ i = 1 m y ( i ) log ⁡ ( h θ ( x ( i ) ) ) + ( 1 − y ( i ) ) log ⁡ ( 1 − h θ ( x ( i ) ) ) J(\theta)=-\frac{1}{m} \sum_{i=1}^{m} y^{(i)} \log \left(h_{\theta}\left(x^{(i)}\right)\right)+\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right) J(θ)=−m1​i=1∑m​y(i)log(hθ​(x(i)))+(1−y(i))log(1−hθ​(x(i)))
导数
∂ ∂ θ j J ( θ ) = 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) \frac{\partial}{\partial \theta_{j}} J(\theta)=\frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)} ∂θj​∂​J(θ)=m1​i=1∑m​(hθ​(x(i))−y(i))xj(i)​
推导
对于logistic回归,m组样本,输入样本 x ( i ) = ( 1 , x 1 ( i ) , x 2 ( i ) , … , x p ( i ) ) T x^{(i)}=\left(1, x_{1}^{(i)}, x_{2}^{(i)}, \ldots, x_{p}^{(i)}\right)^{T} x(i)=(1,x1(i)​,x2(i)​,…,xp(i)​)T,为 p + 1 p+1 p+1维向量(考虑bias); y ( i ) y^{(i)} y(i)表示类别,此处取0或1;模型的参数为 θ = ( θ 0 , θ 1 , … , θ p ) T \theta=\left(\theta_{0}, \theta_{1, \ldots,} \theta_{p}\right)^{T} θ=(θ0​,θ1,…,​θp​)T
θ T x ( i ) : = θ 0 + θ 1 x 1 ( i ) + ⋯ + θ p x p ( i ) . \theta^{T} x^{(i)}:=\theta_{0}+\theta_{1} x_{1}^{(i)}+\cdots+\theta_{p} x_{p}^{(i)} . θTx(i):=θ0​+θ1​x1(i)​+⋯+θp​xp(i)​.
假设函数定义为: h θ ( x ( i ) ) = 1 1 + e − θ T x ( i ) h_{\theta}\left(x^{(i)}\right)=\frac{1}{1+e^{{-\theta ^T}x^{(i)}}} hθ​(x(i))=1+e−θTx(i)1​
P ( y ^ ( i ) = 1 ∣ x ( i ) ; θ ) = h θ ( x ( i ) ) P ( y ^ ( i ) = 0 ∣ x ( i ) ; θ ) = 1 − h θ ( x ( i ) ) log ⁡ P ( y ^ ( i ) = 1 ∣ x ( i ) ; θ ) = log ⁡ h θ ( x ( i ) ) = log ⁡ 1 1 + e − θ T x ( i ) log ⁡ P ( y ^ ( i ) = 0 ∣ x ( i ) ; θ ) = log ⁡ ( 1 − h θ ( x ( i ) ) ) = log ⁡ e − θ T x ( i ) 1 + e − θ T x ( i ) \begin{gathered} P\left(\hat{y}^{(i)}=1 \mid x^{(i)} ; \theta\right)=h_{\theta}\left(x^{(i)}\right) \\ P\left(\hat{y}^{(i)}=0 \mid x^{(i)} ; \theta\right)=1-h_{\theta}\left(x^{(i)}\right) \\ \log P\left(\hat{y}^{(i)}=1 \mid x^{(i)} ; \theta\right)=\log h_{\theta}\left(x^{(i)}\right)=\log \frac{1}{1+e^{{-\theta ^{T}} x^{(i)}}} \\ \log P\left(\hat{y}^{(i)}=0 \mid x^{(i)} ; \theta\right)=\log \left(1-h_{\theta}\left(x^{(i)}\right)\right)=\log \frac{e^{-\theta^{T} x^{(i)}}}{1+e^{-\theta^{T} x^{(i)}}} \end{gathered} P(y^​(i)=1∣x(i);θ)=hθ​(x(i))P(y^​(i)=0∣x(i);θ)=1−hθ​(x(i))logP(y^​(i)=1∣x(i);θ)=loghθ​(x(i))=log1+e−θTx(i)1​logP(y^​(i)=0∣x(i);θ)=log(1−hθ​(x(i)))=log1+e−θTx(i)e−θTx(i)​​
对于第 i i i组样本,假设函数表征正确的组合对数概率为:
I { y ( i ) = 1 } log ⁡ P ( y ^ ( i ) = 1 ∣ x ( i ) ; θ ) + I { y ( i ) = 0 } log ⁡ P ( y ^ ( i ) = 0 ∣ x ( i ) ; θ ) = y ( i ) log ⁡ P ( y ^ ( i ) = 1 ∣ x ( i ) ; θ ) + ( 1 − y ( i ) ) log ⁡ P ( y ^ ( i ) = 0 ∣ x ( i ) ; θ ) = y ( i ) log ⁡ ( h θ ( x ( i ) ) ) + ( 1 − y ( i ) ) log ⁡ ( 1 − h θ ( x ( i ) ) ) \begin{gathered} I\left\{y^{(i)}=1\right\} \log P\left(\hat{y}^{(i)}=1 \mid x^{(i)} ; \theta\right)+I\left\{y^{(i)}=0\right\} \log P\left(\hat{y}^{(i)}=0 \mid x^{(i)} ; \theta\right) \\ =y^{(i)} \log P\left(\hat{y}^{(i)}=1 \mid x^{(i)} ; \theta\right)+\left(1-y^{(i)}\right) \log P\left(\hat{y}^{(i)}=0 \mid x^{(i)} ; \theta\right) \\ =y^{(i)} \log \left(h_{\theta}\left(x^{(i)}\right)\right)+\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right) \end{gathered} I{y(i)=1}logP(y^​(i)=1∣x(i);θ)+I{y(i)=0}logP(y^​(i)=0∣x(i);θ)=y(i)logP(y^​(i)=1∣x(i);θ)+(1−y(i))logP(y^​(i)=0∣x(i);θ)=y(i)log(hθ​(x(i)))+(1−y(i))log(1−hθ​(x(i)))​
对于 m m m组样本可得损失函数:
J ( θ ) = − 1 m ∑ i = 1 m y ( i ) log ⁡ ( h θ ( x ( i ) ) ) + ( 1 − y ( i ) ) log ⁡ ( 1 − h θ ( x ( i ) ) ) J(\theta)=-\frac{1}{m} \sum_{i=1}^{m} y^{(i)} \log \left(h_{\theta}\left(x^{(i)}\right)\right)+\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right) J(θ)=−m1​i=1∑m​y(i)log(hθ​(x(i)))+(1−y(i))log(1−hθ​(x(i)))
J J J取负号的原因:表征正确的概率值越大,模型对数据的表达能力越好;但在衡量模型优劣时表现误差的损失函数且越小越好。两相矛盾,所以令损失函数对表征正确的组合对数概率取反。
求导
第一步:
J ( θ ) = − 1 m ∑ i = 1 m y ( i ) log ⁡ ( h θ ( x ( i ) ) ) + ( 1 − y ( i ) ) log ⁡ ( 1 − h θ ( x ( i ) ) ) = − 1 m ∑ i = 1 m [ − y ( i ) ( log ⁡ ( 1 + e − θ T x ( i ) ) ) + ( 1 − y ( i ) ) ( − θ T x ( i ) − log ⁡ ( 1 + e − θ T x ( i ) ) ) ] = − 1 m ∑ i = 1 m [ y ( i ) θ T x ( i ) − θ T x ( i ) − log ⁡ ( 1 + e − θ T x ( i ) ) ] = − 1 m ∑ i = 1 m [ y ( i ) θ T x ( i ) − log ⁡ e θ T x ( i ) − log ⁡ ( 1 + e − θ T x ( i ) ) ] = − 1 m ∑ i = 1 m [ y ( i ) θ T x ( i ) − ( log ⁡ e θ T x ( i ) + log ⁡ ( 1 + e − θ T x ( i ) ) ) ] = − 1 m ∑ i = 1 m [ y ( i ) θ T x ( i ) − log ⁡ ( e θ T x ( i ) + 1 ) ] \begin{gathered} J(\theta)=-\frac{1}{m} \sum_{i=1}^{m} y^{(i)} \log \left(h_{\theta}\left(x^{(i)}\right)\right)+\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right)\\ =-\frac{1}{m} \sum_{i=1}^{m}\left[-y^{(i)}\left(\log \left(1+e^{-\theta^{T} x^{(i)}}\right)\right)+\left(1-y^{(i)}\right)\left(-\theta^{T} x^{(i)}-\log \left(1+e^{-\theta^{T} x^{(i)}}\right)\right)\right] \\ =-\frac{1}{m} \sum_{i=1}^{m}\left[y^{(i)} \theta^{T} x^{(i)}-\theta^{T} x^{(i)}-\log \left(1+e^{-\theta^{T} x^{(i)}}\right)\right] \\ =-\frac{1}{m} \sum_{i=1}^{m}\left[y^{(i)} \theta^{T} x^{(i)}-\log e^{\theta^{T} x^{(i)}}-\log \left(1+e^{-\theta^{T} x^{(i)}}\right)\right]_{} \\ =-\frac{1}{m} \sum_{i=1}^{m}\left[y^{(i)} \theta^{T} x^{(i)}-\left(\log e^{\theta^{T} x^{(i)}}+\log \left(1+e^{-\theta^{T} x^{(i)}}\right)\right)\right]_{} \\ =-\frac{1}{m} \sum_{i=1}^{m}\left[y^{(i)} \theta^{T} x^{(i)}-\log \left(e^{\theta^{T} x^{(i)}}+1\right)\right] \end{gathered} J(θ)=−m1​i=1∑m​y(i)log(hθ​(x(i)))+(1−y(i))log(1−hθ​(x(i)))=−m1​i=1∑m​[−y(i)(log(1+e−θTx(i)))+(1−y(i))(−θTx(i)−log(1+e−θTx(i)))]=−m1​i=1∑m​[y(i)θTx(i)−θTx(i)−log(1+e−θTx(i))]=−m1​i=1∑m​[y(i)θTx(i)−logeθTx(i)−log(1+e−θTx(i))]​=−m1​i=1∑m​[y(i)θTx(i)−(logeθTx(i)+log(1+e−θTx(i)))]​=−m1​i=1∑m​[y(i)θTx(i)−log(eθTx(i)+1)]​
第二步:
∂ ∂ θ j J ( θ ) = ∂ ∂ θ j ( 1 m ∑ i = 1 m [ log ⁡ ( 1 + e θ T x ( i ) ) − y ( i ) θ T x ( i ) ] ) = 1 m ∑ i = 1 m ( x j ( i ) e θ T x ( i ) 1 + e θ T x ( i ) − y ( i ) x j ( i ) ) = 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) \begin{gathered} \frac{\partial}{\partial \theta_{j}} J(\theta)=\frac{\partial}{\partial \theta_{j}}\left(\frac{1}{m} \sum_{i=1}^{m}\left[\log \left(1+e^{\theta^{T} x^{(i)}}\right)-y^{(i)} \theta^{T} x^{(i)}\right]\right) \\ =\frac{1}{m} \sum_{i=1}^{m}\left(\frac{x_{j}^{(i)} e^{\theta^{T} x^{(i)}}}{1+e^{\theta^{T} x^{(i)}}}-y^{(i)} x_{j}^{(i)}\right) \\ =\frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)} \end{gathered} ∂θj​∂​J(θ)=∂θj​∂​(m1​i=1∑m​[log(1+eθTx(i))−y(i)θTx(i)])=m1​i=1∑m​(1+eθTx(i)xj(i)​eθTx(i)​−y(i)xj(i)​)=m1​i=1∑m​(hθ​(x(i))−y(i))xj(i)​​

2 Softmax交叉熵损失函数

公式
C = − ∑ i y i ln ⁡ a i C=-\sum_{i} y_{i} \ln a_{i} C=−i∑​yi​lnai​
a i = e z i ∑ k e z k , z i = ∑ j w i j x i j + b a_{i}=\frac{e^{z _{i}}}{\sum_{k} e^{z _{k}}},z_{i}=\sum_{j} w_{i j} x_{i j}+b ai​=∑k​ezk​ezi​​,zi​=∑j​wij​xij​+b
其中, y i y_{i} yi​表示真实的分类结果, z i z_{i} zi​为神经元的输出
w i j w_{i j} wij​为第 i i i个神经元的第 j j j个权重, b b b是偏移值, z i z_{i} zi​表示该网络的第 i i i个输出, a i a_{i} ai​为给第 i i i个输出加softmax函数:
导数
∂ C ∂ z i = a i − y i \frac{\partial C}{\partial z_{i}}=a_{i}-y_{i} ∂zi​∂C​=ai​−yi​
推导
∂ C ∂ z i = ∑ j ( ∂ C j ∂ a j ∂ a j ∂ z i ) \frac{\partial C}{\partial z_{i}}=\sum_{j}\left(\frac{\partial C_{j}}{\partial a_{j}} \frac{\partial a_{j}}{\partial z_{i}}\right) ∂zi​∂C​=j∑​(∂aj​∂Cj​​∂zi​∂aj​​)
∂ C j ∂ a j = ∂ ( − y j ln ⁡ a j ) ∂ a j = − y j 1 a j \frac{\partial C_{j}}{\partial a_{j}}=\frac{\partial\left(-y_{j} \ln a_{j}\right)}{\partial a_{j}}=-y_{j} \frac{1}{a_{j}} ∂aj​∂Cj​​=∂aj​∂(−yj​lnaj​)​=−yj​aj​1​
对于 ∂ a j ∂ z i \frac{\partial a_{j}}{\partial z_{i}} ∂zi​∂aj​​有如下两种情况:
(1) i = j i=j i=j
∂ a i ∂ z i = ∂ ( e z i ∑ k e z k ) ∂ z i = ∑ k e z k e z i − ( e z i ) 2 ( ∑ k e z k ) 2 = ( e z i ∑ k e z k ) ( 1 − e z i ∑ k e z k ) = a i ( 1 − a i ) \frac{\partial a_{i}}{\partial z_{i}}=\frac{\partial\left(\frac{e^{z _{i}}}{\sum_{k} e^{z _{k}}}\right)}{\partial z_{i}}=\frac{\sum_{k} e^{z _{k}} e^{z _{i}}-\left(e^{z _{i}}\right)^{2}}{\left(\sum_{k} e^{z _{k}}\right)^{2}}\\ =\left(\frac{e^{z_{i}}}{\sum_{k} e^{z k}}\right)\left(1-\frac{e^{z_{i}}}{\sum_{k} e^{z k}}\right)=a_{i}\left(1-a_{i}\right) ∂zi​∂ai​​=∂zi​∂(∑k​ezk​ezi​​)​=(∑k​ezk​)2∑k​ezk​ezi​−(ezi​)2​=(∑k​ezkezi​​)(1−∑k​ezkezi​​)=ai​(1−ai​)
(2) i ≠ j i \neq j i​=j
∂ a j ∂ z i = ∂ ( e z j ∑ k e z k ) ∂ z i = − e z j ( 1 ∑ k e z k ) 2 e z i = − a i a j \frac{\partial a_{j}}{\partial z_{i}}=\frac{\partial\left(\frac{e^{z _{j}}}{\sum k e^{z_{k}}}\right)}{\partial z_{i}}=-e^{z_{ j}}\left(\frac{1}{\sum_{k} e^{z k}}\right)^{2} e^{z_ {i}}=-a_{i} a_{j} ∂zi​∂aj​​=∂zi​∂(∑kezk​ezj​​)​=−ezj​(∑k​ezk1​)2ezi​=−ai​aj​
综上:
∂ C ∂ z i = ∑ j ( ∂ C j ∂ a j ∂ a j ∂ z i ) = ∑ j ≠ i ( ∂ C j ∂ a j ∂ a j ∂ z i ) + ∑ i = j ( ∂ C j ∂ a j ∂ a j ∂ z i ) = ∑ j ≠ i − y j 1 a j ( − a i a j ) + ( − y i 1 a i ) ( a i ( 1 − a i ) ) = ∑ j ≠ i a i y j + ( − y i ( 1 − a i ) ) = ∑ j ≠ i a i y j + a i y i − y i = a i ∑ j y j − y i \begin{aligned} &\frac{\partial C}{\partial z_{i}}=\sum_{j}\left(\frac{\partial C_{j}}{\partial a_{j}} \frac{\partial a_{j}}{\partial z_{i}}\right)=\sum_{j \neq i}\left(\frac{\partial C_{j}}{\partial a_{j}} \frac{\partial a_{j}}{\partial z_{i}}\right)+\sum_{i=j}\left(\frac{\partial C_{j}}{\partial a_{j}} \frac{\partial a_{j}}{\partial z_{i}}\right) \\ &=\sum_{j \neq i}-y_{j} \frac{1}{a_{j}}\left(-a_{i} a_{j}\right)+\left(-y_{i} \frac{1}{a_{i}}\right)\left(a_{i}\left(1-a_{i}\right)\right) \\ &=\sum_{j \neq i} a_{i} y_{j}+\left(-y_{i}\left(1-a_{i}\right)\right) \\ &=\sum_{j \neq i} a_{i} y_{j}+a_{i} y_{i}-y_{i} \\ &=a_{i} \sum_{j} y_{j}-y_{i} \end{aligned} ​∂zi​∂C​=j∑​(∂aj​∂Cj​​∂zi​∂aj​​)=j​=i∑​(∂aj​∂Cj​​∂zi​∂aj​​)+i=j∑​(∂aj​∂Cj​​∂zi​∂aj​​)=j​=i∑​−yj​aj​1​(−ai​aj​)+(−yi​ai​1​)(ai​(1−ai​))=j​=i∑​ai​yj​+(−yi​(1−ai​))=j​=i∑​ai​yj​+ai​yi​−yi​=ai​j∑​yj​−yi​​
针对分类问题, y i yi yi最终只会有一个类别是1,其他类别都是0
所以 ∂ C ∂ z i = a i − y i \frac{\partial C}{\partial z_{i}}=a_{i}-y_{i} ∂zi​∂C​=ai​−yi​

附录 求导公式和法则

基本初等函数求导公式
(1) ( C ) ′ = 0 \quad(C)^{\prime}=0 (C)′=0
(2) ( x μ ) ′ = μ x μ − 1 \quad\left(x^{\mu}\right)^{\prime}=\mu x^{\mu-1} (xμ)′=μxμ−1
(3) ( sin ⁡ x ) ′ = cos ⁡ x (\sin x)^{\prime}=\cos x (sinx)′=cosx
(4) ( cos ⁡ x ) ′ = − sin ⁡ x (\cos x)^{\prime}=-\sin x (cosx)′=−sinx
(5) ( tan ⁡ x ) ′ = sec ⁡ 2 x (\tan x)^{\prime}=\sec ^{2} x (tanx)′=sec2x
(6) ( cot ⁡ x ) ′ = − csc ⁡ 2 x (\cot x)^{\prime}=-\csc ^{2} x (cotx)′=−csc2x
(7) ( sec ⁡ x ) ′ = sec ⁡ x tan ⁡ x (\sec x)^{\prime}=\sec x \tan x (secx)′=secxtanx
(8) ( csc ⁡ x ) ′ = − csc ⁡ x cot ⁡ x (\csc x)^{\prime}=-\csc x \cot x (cscx)′=−cscxcotx
(9) ( a x ) ′ = a x ln ⁡ a \left(a^{x}\right)^{\prime}=a^{x} \ln a (ax)′=axlna
(10) ( e x ) ′ = e x \left(\mathrm{e}^{x}\right)^{\prime}=\mathrm{e}^{x} (ex)′=ex
(11) ( log ⁡ a x ) ′ = 1 x ln ⁡ a \left(\log _{a} x\right)^{\prime}=\frac{1}{x \ln a} (loga​x)′=xlna1​
(12) ( ln ⁡ x ) ′ = 1 x (\ln x)^{\prime}=\frac{1}{x} (lnx)′=x1​,
(13) ( arcsin ⁡ x ) ′ = 1 1 − x 2 (\arcsin x)^{\prime}=\frac{1}{\sqrt{1-x^{2}}} (arcsinx)′=1−x2 ​1​
(14) ( arccos ⁡ x ) ′ = − 1 1 − x 2 (\arccos x)^{\prime}=-\frac{1}{\sqrt{1-x^{2}}} (arccosx)′=−1−x2 ​1​
(15) ( arctan ⁡ x ) ′ = 1 1 + x 2 (\arctan x)^{\prime}=\frac{1}{1+x^{2}} (arctanx)′=1+x21​
(16) ( arccot ⁡ x ) ′ = − 1 1 + x 2 (\operatorname{arccot} x)^{\prime}=-\frac{1}{1+x^{2}} (arccotx)′=−1+x21​
求导法则
设 u = u ( x ) , v = v ( x ) u=u(x), v=v(x) u=u(x),v=v(x) 都可导, 则
(1) ( u ± v ) ′ = u ′ ± v ′ \quad(u \pm v)^{\prime}=u^{\prime} \pm v^{\prime} (u±v)′=u′±v′
(2) ( C u ) ′ = C u ′ ( C (C u)^{\prime}=C u^{\prime}(C (Cu)′=Cu′(C 是常数)
(3) ( u v ) ′ = u ′ v + u v ′ \quad(u v)^{\prime}=u^{\prime} v+u v^{\prime} (uv)′=u′v+uv′
(4) ( u v ) ′ = u ′ v − u v ′ v 2 \left(\frac{u}{v}\right)^{\prime}=\frac{u^{\prime} v-u v^{\prime}}{v^{2}} (vu​)′=v2u′v−uv′​
复合函数求导法则
设 y = f ( u ) y=f(u) y=f(u), 而 u = φ ( x ) u=\varphi(x) u=φ(x) 且 f ( u ) f(u) f(u) 及 φ ( x ) \varphi(x) φ(x) 都可导, 则复合函数 y = f [ φ ( x ) ] y=f[\varphi(x)] y=f[φ(x)] 的导数为
d y d x = d y d u ⋅ d u d x  或  y ′ = f ′ ( u ) ⋅ φ ′ ( x ) \frac{d y}{d x}=\frac{d y}{d u} \cdot \frac{d u}{d x} \text { 或 } y^{\prime}=f^{\prime}(u) \cdot \varphi^{\prime}(x) dxdy​=dudy​⋅dxdu​ 或 y′=f′(u)⋅φ′(x)

上一篇:凸优化学习笔记(1)——仿射集、凸集、凸锥


下一篇:希腊字母的latex代码