0 矩阵求导的本质
矩阵\(A\)对矩阵\(B\)求导:矩阵\(A\)中的每个元素分别对矩阵\(B\)中的每个元素求导。
因变量 | 自变量 | 导数个数 |
---|---|---|
\(A_{1\times 1}\) | \(B_{1\times 1}\) | 1个导数 |
\(A_{m\times 1}\) | \(B_{1\times 1}\) | \(m\)个导数 |
\(A_{m\times 1}\) | \(B_{p\times 1}\) | \(m\times p\)个导数 |
\(A_{m\times n}\) | \(B_{p\times q}\) | m\(\times n\)\(\times p\)\(\times q\)个导数 |
1 两个概念
1.1 标量函数
\(\left [ f \right ]_{1\times 1}\):函数\(f\)是一个具体的值,如:\(f(x_{1},x_{2})=2x_{1}+3x_{2}^{2}\)。
1.2 向量函数
\(\left [ f \right ]_{m\times n}\):形如\(f=\left [f_{1}(x)=x_{1},f_{2}(x)=x^{2} \right]_{1\times 2}\)
2 以分母布局为例的矩阵求导基本原则
2.1 原则一:
若\(f\)为标量函数,\(f=f(x_{1},x_{2},...,x_{p})\),\(X\)为列向量,\(X=\begin{bmatrix}
x_{1}\\
x_{2}\\
\vdots \\
x_{p}\\end{bmatrix}\),则定义:
\(\frac{\partial f}{\partial X}=\begin{bmatrix}
\frac{\partial f}{\partial x_{1}}\\
\frac{\partial f}{\partial x_{2}}\\
\vdots\\
\frac{\partial f}{\partial x_{p}}\\end{bmatrix}\)
若\(f\)为标量函数,\(x\)为行向量,则定义:
\(\frac{\partial f}{\partial X}=\begin{bmatrix} \frac{\partial f}{\partial x_{1}} &\frac{\partial f}{\partial x_{2}} & \cdots & \frac{\partial f}{\partial x_{p}} \end{bmatrix}_{1\times p}\)
2.2 原则二
若\(f\)为列向量,\(f=\begin{bmatrix} f_{1}(x)\\ f_{2}(x)\\ \vdots \\ f_{m}(x) \end{bmatrix}_{m\times 1}\),\(X\)为标量,则定义:
\(\frac{\partial f}{\partial X}=\begin{bmatrix} \frac{\partial f_{1}(x)}{\partial X} & \frac{\partial f_{2}(x)}{\partial X} & \cdots & \frac{\partial f_{m}(x)}{\partial X} \end{bmatrix}=\frac{\partial (f^{T})}{\partial X}\)
若\(\begin{bmatrix} f \end{bmatrix}_{m\times 1}\),\(\begin{bmatrix} X \end{bmatrix}_{p\times 1}\),则定义:
\(\frac{\partial f}{\partial X}=\begin{bmatrix} \frac{\partial f}{\partial x_{1}}\\ \frac{\partial f}{\partial x_{2}}\\ \vdots \\ \frac{\partial f}{\partial x_{p}} \end{bmatrix}=\begin{bmatrix} \frac{\partial f_{1}}{\partial x_{1}} & \frac{\partial f_{2}}{\partial x_{1}} & \cdots & \frac{\partial f_{m}}{\partial x_{1}}\\ \frac{\partial f_{1}}{\partial x_{2}} & \frac{\partial f_{2}}{\partial x_{2}} & \cdots & \frac{\partial f_{m}}{\partial x_{2}}\\ \vdots & \vdots & \vdots & \vdots \\ \frac{\partial f_{1}}{\partial x_{p}} & \frac{\partial f_{2}}{\partial x_{p}} & \cdots & \frac{\partial f_{m}}{\partial x_{p}} \end{bmatrix}\)
3 常用的公式
对于形如\(f(A,x)\)的函数,其中\(A=\begin{bmatrix} a_{1} &a_{2} &\cdots & a_{p} \end{bmatrix}\),\(x=\begin{bmatrix} x_{1}\\ x_{2}\\ \vdots \\ x_{p} \end{bmatrix}\),则有:
- \(\frac{\partial f}{\partial x}=\frac{\partial A^{T}x}{\partial x}=\frac{\partial x^{T}A}{\partial x}=A\)
- \(\frac{\partial f}{\partial x}=\frac{\partial x^{T}Ax}{\partial x}=(A+A^{T})x\)
- \(\frac{\partial f}{\partial x}=\frac{\partial Ax}{\partial x}=A^{T}\)
- \(\frac{\partial x^{T}x}{\partial x}=2x\)
若:\(U=\begin{bmatrix} u_{1}(x)\\ u_{2}(x)\\ \vdots \\ u_{p}(x) \end{bmatrix}\),\(V=\begin{bmatrix} v_{1}(x)\\ v_{2}(x)\\ \vdots \\ v_{p}(x) \end{bmatrix}\),则有:
- \(\frac{\partial U^{T}V}{\partial x}=\frac{\partial U}{\partial x}V+\frac{\partial V}{\partial x}U\)
- \(\frac{\partial (U+V)}{\partial x}=\frac{\partial U}{\partial x}+\frac{\partial V}{\partial x}\)