线性判别器(LDA)
线性判别器思想:类内小,类间大;通过选择一个投影方向将 \(p\) 维数据进行投影到 一维直接上,并且在这直线可以通过一个阈值进行分类。所以我们要找 \(w\)。
首先我们做以下定义:
数据集 $ {(x_i,y_i)}_{i=1}^{N}、x \in R^p、 y \in {+1, 0}$
假设两类数据 \(x_{c_1} = \{x_i | y_i = 0\}、 x_{c_2} = \{x_i | y_i = 0\}、 |x_{c_1} = N_1|、|x_{c_2}| = N_2、N_1 + N_2 = N\)
根据求均值公式:
\[\mu = \frac{1}{N} \sum_{i = 1}^{N}x_i \]得到类心:
\[\mu_1 =\bar{x_{c_1}}= \frac{1}{N_1} \sum_{i = 1}^{N_1}x_i\\ \mu_2 =\bar{x_{c_2}}= \frac{1}{N_2} \sum_{i = 1}^{N_2}x_i \]投影到直线上类心为:
\[z_1 = \frac{1}{N_1} \sum_{i = 1}^{N_1} w^T x_i \\ z_2 = \frac{1}{N_2} \sum_{i = 1}^{N_2} w^T x_i \\ \]类心距为(类间距):
\[(z_1 - z_2)^2 = (\frac{1}{N_1} \sum_{i = 1}^{N_1} w^T x_i - \frac{1}{N_1} \sum_{i = 1}^{N_1} w^T x_i)^2 \\ = (w^T * \mu_1 - w^T * \mu_2)^2 \\ = w^T(\mu_1 - \mu_2)(\mu_1 - \mu_2)^Tw\\ 令 \quad S_b = (\mu_1 - \mu_2)(\mu_1 - \mu_2)^T \\ (z_1 - z_2)^2 = w^T * S_b * w \]其中 \(S_b\) 为类间散列矩阵。
类内距离用方差进行表示:
\[S_1 = \frac{1}{N_1} \sum_{i = 1}^{N_1}(w^Tx_i - z_1)(w^Tx_i - z_1)^T\\ S_2 = \frac{1}{N_2} \sum_{i = 1}^{N_2}(w^Tx_i - z_2)(w^Tx_i - z_2)^T\\ S_1 = \frac{1}{N_1} \sum_{i = 1}^{N_1}(w^Tx_i - \frac{1}{N_1} \sum_{i = 1}^{N_1} w^T x_i)(w^Tx_i - \frac{1}{N_1} \sum_{i = 1}^{N_1} w^T x_i)^T\\ = S_1 = \frac{1}{N_1} \sum_{i = 1}^{N_1}(w^Tx_i - w^T \mu_1)(w^Tx_i - w^T \mu_1)^T\\ = \frac{1}{N_1} w^T\sum_{i = 1}^{N_1}(x_i - \mu_1)(x_i - \mu_1)^T w\\ = w^T[\frac{1}{N_1} \sum_{i = 1}^{N_1}(x_i - \mu_1)(x_i - \mu_1)^T] w\\ 令 \quad S_{c_1} = \frac{1}{N_1} \sum_{i = 1}^{N_1}(x_i - \mu_1)(x_i - \mu_1)^T \\ S_1 = w^T S_{c_1} w\\ S_2 = w^T S_{c_2} w\\ \]由于我们需要的是类间距离大,类内距离小,所以设目标函数:
\[J(w) = \frac{(z_1 - z_2)^2}{S_1 + S_2}\\ 令 \quad S_w = S_{c_1} + S_{c_2} \\ J(w) = \frac{w^T S_b w}{w^T S_w w} \]其中 \(S_w\) 为类内散度矩阵。
下面我们对 \(J(w)\) 进行求导:
\[\frac{\partial}{\partial w}J(w) = 2S_bw(w^TS_ww)^{-1} - 2w^TS_bw(w^TS_ww)^{-2}S_W w = 0\\ \Longrightarrow S_bw(w^TS_ww) = (w^TS_bw)S_ww\\ \Longrightarrow S_ww = \frac{(w^TS_ww)}{(w^TS_bw)}S_bw\\ \Longrightarrow w = \frac{(w^TS_ww)}{(w^TS_sw)}S_w^{-1}S_bw\\ \frac{(w^TS_bw)}{(w^TS_ww)} \in R \\ \Longrightarrow w \propto S_w^{-1}S_bw = S_w^{-1}(\mu_1 - \mu_2)(\mu_1 - \mu_2)^Tw\\ \Longrightarrow w \propto S_w^{-1}(\mu_1 - \mu_2) \]另一种求解:
由于 \(J\) 与 \(w\) 的方向有关,确定方向后,与 \(w\) 的长度无关。求解过程中,分子分母会同时变化,所以首先固定分母为某一个非 \(0\) 常数,即:
\[w^TS_ww = c \]则原问题转化一个约束问题:
\[\begin{cases} max_w \quad w^T S_b w \\ s.t. \quad w^TS_ww = c \end{cases} \]引入拉格朗日乘子:
\[L(w, \lambda) = w^TS_b w - \lambda(w^TS_ww - c) \]转化为无约束问题:
\[\begin{cases} max_w max_\lambda L(w,\lambda) \\ s.t. \quad \lambda > 0 \end{cases} \]将 \(max\) 调换位置:
\[\begin{cases} max_\lambda max_w L(w,\lambda) \\ s.t. \quad \lambda > 0 \end{cases} \]将 $ L(w,\lambda)$ 对 \(w\) 进行求导:
\[\frac{\partial L(w,\lambda)}{\partial w} = (S_b + S_b^T)w - \lambda(S_w + S_w^T)w\\ = 2S_bw - 2\lambda S_ww = 0\\ S_w^{-1}S_bw = \lambda w\\ S_b w = (\mu_1 - \mu_2)(\mu_1 - \mu_2)^Tw = \beta(\mu_1 - \mu_2) \\ w = \frac{\beta}{\lambda} S_w^{-1}(\mu_1 - \mu_2)\\ w = S_w^{-1}(\mu_1 - \mu_2) \]考虑到数值解的稳定性,在实践中通常对 \(S_w\) 进行奇异值分解。至此 \(w\) 求得。