函数矩阵对矩阵求导

函数对矩阵以及函数矩阵对矩阵求导,我理解主要就是一种简化的写法,用矩阵将多个多元函数对每个元求导写成矩阵的形式,看起来比较简洁。

函数对矩阵的导数

设矩阵 X = ( ξ i j ) m × n \mathbf{X}=({\xi_{ij}})_{m\times n} X=(ξij​)m×n​, m n mn mn元函数 f ( X ) = f ( ξ 11 , ξ 12 , ξ 13 , … , ξ m 1 , … , ξ m n ) f(\mathbf{X})=f(\xi_{11},\xi_{12},\xi_{13},\dots,\xi_{m1},\dots,\xi_{mn}) f(X)=f(ξ11​,ξ12​,ξ13​,…,ξm1​,…,ξmn​),则 f ( X ) f(\mathbf{X}) f(X)对矩阵 X \mathbf{X} X的导数为,

d f d X = ( ∂ f ∂ ξ i j ) m × n = [ ∂ f ∂ ξ 11 ∂ f ∂ ξ 12 … ∂ f ∂ ξ 1 n ⋮ ⋮ ⋮ ∂ f ∂ ξ m 1 ∂ f ∂ ξ m 2 … ∂ f ∂ ξ m n ] \dfrac {df}{d\mathbf{X}}=\left( \dfrac {\partial f}{\partial \xi _{ij}}\right) _{m\times n}=\begin{bmatrix} \dfrac {\partial f}{\partial \xi_{11}} & \dfrac {\partial f}{\partial \xi_{12}} & \ldots & \dfrac {\partial f}{\partial \xi_{1n}} \\ \vdots & \vdots & & \vdots \\ \dfrac {\partial f}{\partial \xi_{m1}} & \dfrac {\partial f}{\partial \xi_{m2}} & \ldots & \dfrac {\partial f}{\partial \xi_{mn}} \end{bmatrix} dXdf​=(∂ξij​∂f​)m×n​=⎣⎢⎢⎢⎢⎡​∂ξ11​∂f​⋮∂ξm1​∂f​​∂ξ12​∂f​⋮∂ξm2​∂f​​……​∂ξ1n​∂f​⋮∂ξmn​∂f​​⎦⎥⎥⎥⎥⎤​

例如, X = ( ξ 1 , ξ 2 , … , ξ n ) T \mathbf{X}=(\xi_1,\xi_2,\dots,\xi_n)^{T} X=(ξ1​,ξ2​,…,ξn​)T, n n n元函数 f ( X ) = f ( ξ 1 , ξ 2 , … , ξ n ) f(\mathbf{X})=f(\xi_1,\xi_2,\dots,\xi_n) f(X)=f(ξ1​,ξ2​,…,ξn​),
则,
d f d X = ( d f d ξ 1 , d f d ξ 2 , … , d f ξ n ) T \dfrac{df}{d\mathbf{X}}=(\dfrac{df}{d\xi_1},\dfrac{df}{d\xi_2},\dots,\dfrac{df}{\xi_n})^{T} dXdf​=(dξ1​df​,dξ2​df​,…,ξn​df​)T
又有,
d f d X T = ( d f d ξ 1 , d f d ξ 2 , … , d f ξ n ) \dfrac{df}{d{\mathbf{X}}^{T}}=(\dfrac{df}{d\xi_1},\dfrac{df}{d\xi_2},\dots,\dfrac{df}{\xi_n}) dXTdf​=(dξ1​df​,dξ2​df​,…,ξn​df​)

函数矩阵对矩阵的导数

设矩阵 X = ( ξ i j ) m × n \mathbf{X}=({\xi_{ij}})_{m\times n} X=(ξij​)m×n​, m n mn mn元函数 f i j ( X ) = f i j ( ξ 11 , ξ 12 , ξ 13 , … , ξ m 1 , … , ξ m n ) ( i = 1 , 2 , 3 … , r ; j = 1 , 2 , … , s ) f_{ij}(\mathbf{X})=f_{ij}(\xi_{11},\xi_{12},\xi_{13},\dots,\xi_{m1},\dots,\xi_{mn})(i=1,2,3\dots,r;j=1,2,\dots,s) fij​(X)=fij​(ξ11​,ξ12​,ξ13​,…,ξm1​,…,ξmn​)(i=1,2,3…,r;j=1,2,…,s),定义函数矩阵,
F ( X ) = [ f 11 ( X ) … f 1 s ( X ) ⋮ ⋮ ⋮ f r 1 ( X ) … f r s ( X ) ] \mathbf{F}(\mathbf{X})=\begin{bmatrix} {f_{11}(\mathbf{X} )} & \ldots & {f_{1s}(\mathbf{X}}) \\ \vdots & \vdots & \vdots \\ {f_{r1}(\mathbf{X} )} & \dots & {f_{rs}(\mathbf{X})} \end{bmatrix} F(X)=⎣⎢⎡​f11​(X)⋮fr1​(X)​…⋮…​f1s​(X)⋮frs​(X)​⎦⎥⎤​
对矩阵 X \mathbf{X} X的导数为,
d F d X = [ ∂ F ∂ ξ 11 ∂ F ∂ ξ 12 … ∂ F ∂ ξ 1 n ⋮ ⋮ ⋮ ∂ F ∂ ξ m 1 ∂ F ∂ ξ m 2 … ∂ F ∂ ξ m n ] \dfrac {d\mathbf{F}}{d\mathbf{X}}=\begin{bmatrix} \dfrac {\partial \mathbf{F}}{\partial \xi_{11}} & \dfrac {\partial \mathbf{F}}{\partial \xi_{12}} & \ldots & \dfrac {\partial \mathbf{F}}{\partial \xi_{1n}} \\ \vdots & \vdots & & \vdots \\ \dfrac {\partial \mathbf{F}}{\partial \xi_{m1}} & \dfrac {\partial \mathbf{F}}{\partial \xi_{m2}} & \ldots & \dfrac {\partial \mathbf{F}}{\partial \xi_{mn}} \end{bmatrix} dXdF​=⎣⎢⎢⎢⎢⎡​∂ξ11​∂F​⋮∂ξm1​∂F​​∂ξ12​∂F​⋮∂ξm2​∂F​​……​∂ξ1n​∂F​⋮∂ξmn​∂F​​⎦⎥⎥⎥⎥⎤​
其中,
∂ F ∂ ξ i j = [ ∂ f 11 ∂ ξ i j ∂ f 12 ∂ ξ i j … ∂ f 1 s ∂ ξ i j ⋮ ⋮ ⋮ ∂ f r 1 ∂ ξ i j ∂ f r 1 ∂ ξ i j … ∂ f r s ∂ ξ i j ] \dfrac {\partial \mathbf{F}}{\partial \xi _{ij}}=\begin{bmatrix} \dfrac {\partial f_{11}}{\partial \xi_{ij}} & \dfrac {\partial f_{12}}{\partial \xi_{ij}} & \ldots & \dfrac {\partial f_{1s}}{\partial \xi _{ij}} \\ \vdots & \vdots & & \vdots \\ \dfrac {\partial f_{r1}}{\partial \xi_{ij}} & \dfrac {\partial f_{r1}}{\partial \xi_{ij}} & \ldots & \dfrac {\partial f_{rs}}{\partial \xi_{ij}} \end{bmatrix} ∂ξij​∂F​=⎣⎢⎢⎢⎢⎢⎡​∂ξij​∂f11​​⋮∂ξij​∂fr1​​​∂ξij​∂f12​​⋮∂ξij​∂fr1​​​……​∂ξij​∂f1s​​⋮∂ξij​∂frs​​​⎦⎥⎥⎥⎥⎥⎤​

例如, X = ( ξ 1 , ξ 2 , … , ξ n ) T \mathbf{X}=(\xi_1,\xi_2,\dots,\xi_n)^{T} X=(ξ1​,ξ2​,…,ξn​)T, n n n元函数 f ( X ) = f ( ξ 1 , ξ 2 , … , ξ n ) f(\mathbf{X})=f(\xi_1,\xi_2,\dots,\xi_n) f(X)=f(ξ1​,ξ2​,…,ξn​),
则,
d f d X = ( d f d ξ 1 , d f d ξ 2 , … , d f ξ n ) T \dfrac{df}{d\mathbf{X}}=(\dfrac{df}{d\xi_1},\dfrac{df}{d\xi_2},\dots,\dfrac{df}{\xi_n})^{T} dXdf​=(dξ1​df​,dξ2​df​,…,ξn​df​)T
因此,
d d X T ( d f d X ) = [ ∂ 2 f ∂ ξ 1 2 ∂ 2 f ∂ ξ 1 ∂ ξ 2 ⋯ ∂ 2 f ∂ ξ 1 ∂ ξ n ∂ 2 f ∂ ξ 2 ∂ ξ 1 ∂ 2 f ∂ ξ 2 2 ⋯ ∂ 2 f ∂ ξ 2 ∂ ξ n ⋮ ⋮ ⋮ ⋮ ∂ 2 f ∂ ξ n ∂ ξ 1 ∂ 2 f ∂ ξ n ∂ ξ 2 ⋯ ∂ 2 f ∂ ξ n 2 ] \dfrac {d}{d\mathbf{X}^{T}}\left( \dfrac {df}{d\mathbf{X}}\right) =\begin{bmatrix}\dfrac {\partial ^{2}f}{\partial \xi_{1}^2} & \dfrac {\partial^{2} f}{\partial \xi _{1}\partial \xi _{2}} & \cdots & \dfrac {\partial^{2} f}{\partial \xi _{1}\partial \xi _{n}}\\ \dfrac {\partial ^{2}f}{\partial \xi_{2}\partial \xi_{1}} & \dfrac {\partial^{2} f}{{\partial \xi _{2}}^{2}} & \cdots & \dfrac {\partial^{2} f}{\partial \xi _{2}\partial \xi _{n}}\\ \vdots &\vdots & \vdots& \vdots\\ \dfrac {\partial ^{2}f}{\partial \xi_{n}\partial \xi_{1}} & \dfrac {\partial^{2} f}{\partial \xi_{n}\partial \xi _{2}} & \cdots & \dfrac {\partial^{2} f}{\partial {\xi _{n}}^{2}} \end{bmatrix} dXTd​(dXdf​)=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡​∂ξ12​∂2f​∂ξ2​∂ξ1​∂2f​⋮∂ξn​∂ξ1​∂2f​​∂ξ1​∂ξ2​∂2f​∂ξ2​2∂2f​⋮∂ξn​∂ξ2​∂2f​​⋯⋯⋮⋯​∂ξ1​∂ξn​∂2f​∂ξ2​∂ξn​∂2f​⋮∂ξn​2∂2f​​⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤​

上一篇:hihoCoder #1067 : 最近公共祖先·二 [ 离线LCA tarjan ]


下一篇:PyTorch的自动混合精度(AMP)