Raki的统计学习方法笔记0xB(11)章:条件随机场

为了完成nlp-beginner任务4,所以先复习一下CRF

按顺序看以下:

模型

条件随机场是由转移特征函数和状态特征函数构成的

参数化形式:
P ( y ∣ x ) = 1 Z ( x ) exp ⁡ ( ∑ i , k λ k t k ( y y − 1 , y i , x , i ) + ∑ i , l μ l s l ( y i , x , i ) ) Z ( x ) = ∑ y exp ⁡ ( ∑ i , k λ k t k ( y y − 1 , y i , x , i ) + ∑ i , l μ l s l ( y i , x , i ) ) \begin{aligned} P(y|x)=&\frac{1}{Z(x)}\mathop{\exp}\left(\sum_{i,k}\lambda_k t_k(y_{y-1},y_i,x,i)+\sum_{i,l}\mu_ls_l(y_i,x,i) \right) \\ Z(x) =& \sum_y \mathop{\exp}\left(\sum_{i,k}\lambda_k t_k(y_{y-1},y_i,x,i)+\sum_{i,l}\mu_ls_l(y_i,x,i) \right) & \end{aligned} P(y∣x)=Z(x)=​Z(x)1​exp⎝⎛​i,k∑​λk​tk​(yy−1​,yi​,x,i)+i,l∑​μl​sl​(yi​,x,i)⎠⎞​y∑​exp⎝⎛​i,k∑​λk​tk​(yy−1​,yi​,x,i)+i,l∑​μl​sl​(yi​,x,i)⎠⎞​​​

简化形式:
f k ( y i − 1 , y i , x , i ) = { t k ( y y − 1 , y i , x , i ) , k = 1 , 2 , . . . , K 1 s l ( y i , x , i ) , k = K 1 + l ; l = 1 , 2 , . . . , K 2 f k ( y , x ) = ∑ i = 1 n f k ( y i − 1 , y i , x , i ) , k = 1 , 2 , . . . , K w k = { λ k , k = 1 , 2 , . . . , K 1 μ l , k = K 1 + l ; l = 1 , 2 , . . . , K 2 P ( y ∣ x ) = 1 Z ( x ) exp ⁡ ∑ k = 1 K w k f k ( y , x ) 若 w 和 F ( 一 个 K 个 特 征 函 数 组 成 的 向 量 ) 为 向 量 形 式 : P w ( y ∣ x ) = exp ⁡ ( w ⋅ F ( y , x ) ) Z w ( x ) \begin{aligned} & f_k(y_{i-1},y_i,x,i)=\begin{cases} t_k(y_{y-1},y_i,x,i), &k =1,2,...,K_1\\ s_l(y_i,x,i),&k=K_1+l;l=1,2,...,K_2 \end{cases} \\ & f_k(y,x)=\sum_{i=1}^nf_k(y_{i-1},y_i,x,i),k=1,2,...,K \\ & w_k=\begin{cases} \lambda_k,k=1,2,...,K_1\\ \mu_l,k=K_1+l;l=1,2,...,K_2 \end{cases} \\ & P(y|x)=\frac{1}{Z(x)}\mathop{\exp}\sum_{k=1}^Kw_kf_k(y,x) \\ & 若w和F(一个K个特征函数组成的向量)为向量形式:\\ & P_w(y|x)=\frac{\mathop{\exp}(w \cdot F(y,x))}{Z_w(x)} \end{aligned} ​fk​(yi−1​,yi​,x,i)={tk​(yy−1​,yi​,x,i),sl​(yi​,x,i),​k=1,2,...,K1​k=K1​+l;l=1,2,...,K2​​fk​(y,x)=i=1∑n​fk​(yi−1​,yi​,x,i),k=1,2,...,Kwk​={λk​,k=1,2,...,K1​μl​,k=K1​+l;l=1,2,...,K2​​P(y∣x)=Z(x)1​expk=1∑K​wk​fk​(y,x)若w和F(一个K个特征函数组成的向量)为向量形式:Pw​(y∣x)=Zw​(x)exp(w⋅F(y,x))​​

学习策略

y ∗ = arg ⁡ max ⁡ y P ( y ∣ x ) = arg ⁡ max ⁡ y exp ⁡ ( w ⋅ F ( y , x ) ) Z w ( x ) = arg ⁡ max ⁡ y exp ⁡ ( w ⋅ F ( y , x ) ) = arg ⁡ max ⁡ y ( w ⋅ F ( y , x ) ) \begin{aligned} y^* &= \mathop{\arg\max_{y}}P(y|x) \\ &=\mathop{\arg\max_{y}}\frac{\mathop{\exp}(w \cdot F(y,x))}{Z_w(x)}\\ &=\mathop{\arg\max_{y}\mathop{\exp}(w \cdot F(y,x))}\\ &=\mathop{\arg\max_{y}}(w \cdot F(y,x))\\ \end{aligned} y∗​=argymax​P(y∣x)=argymax​Zw​(x)exp(w⋅F(y,x))​=argymax​exp(w⋅F(y,x))=argymax​(w⋅F(y,x))​
条件随机场的预测问题成为求非规范化概率最大的最优路径问题:
max ⁡ y ( w ⋅ F ( y , x ) ) \begin{aligned} \mathop{\max_{y}}(w \cdot F(y,x)) \end{aligned} ymax​(w⋅F(y,x))​
为了求解最优路径,讲上式写成如下形式:
max ⁡ y ∑ i = 1 n w ⋅ F i ( y i − 1 , y i , x ) \begin{aligned} \mathop{\max_{y}}\sum_{i=1}^nw \cdot F_i(y_{i-1},y_i,x) \end{aligned} ymax​i=1∑n​w⋅Fi​(yi−1​,yi​,x)​

预测算法

输入:模型特征向量 F ( y , x ) F(y,x) F(y,x) 和权值向量 w w w,观测序列 x = ( x 1 , x 2 , . . . , x n ) x = (x_1,x_2,...,x_n) x=(x1​,x2​,...,xn​)
输出:最优路径 y ∗ = ( y 1 ∗ , y 2 ∗ , . . . , y n ∗ ) y^*=(y_1^*,y_2^*,...,y_n^*) y∗=(y1∗​,y2∗​,...,yn∗​)
(1)初试化:
δ 1 ( j ) = w ⋅ F 1 ( y 0 = s t a r t , y 1 = j , x ) , j = 1 , 2 , . . . , m \begin{aligned} \delta_1(j)=w \cdot F_1(y_0=start,y_1=j,x),j=1,2,...,m \end{aligned} δ1​(j)=w⋅F1​(y0​=start,y1​=j,x),j=1,2,...,m​
(2)递推:对 i = 2 , 3 , . . . , n i = 2, 3,...,n i=2,3,...,n
δ 1 ( j ) = max ⁡ i ≤ j ≤ m { δ i − 1 + w ⋅ F i ( y i − 1 = j , y i = l , x ) } , j = 1 , 2 , . . . , m Ψ i ( l ) = arg ⁡ max ⁡ 1 ≤ j ≤ m { δ i − 1 + w ⋅ F i ( y i − 1 = j , y i = l , x ) } , j = 1 , 2 , . . . , m \begin{aligned} & \delta_1(j)= \mathop{\max_{i\le j \le m }} \{\delta_{i-1}+w \cdot F_i(y_{i-1}=j,y_i=l,x) \},j=1,2,...,m \\ & \Psi_i(l)=\mathop{\arg\max_{1 \le j \le m}}\{\delta_{i-1}+w \cdot F_i(y_{i-1}=j,y_i=l,x) \},j=1,2,...,m \end{aligned} ​δ1​(j)=i≤j≤mmax​{δi−1​+w⋅Fi​(yi−1​=j,yi​=l,x)},j=1,2,...,mΨi​(l)=arg1≤j≤mmax​{δi−1​+w⋅Fi​(yi−1​=j,yi​=l,x)},j=1,2,...,m​
(3)终止:
max ⁡ y ( w ⋅ F ( y , x ) ) = max ⁡ i ≤ j ≤ m δ n ( j ) y n ∗ = arg ⁡ max ⁡ i ≤ j ≤ m δ n ( j ) \begin{aligned} \mathop{\max_{y}}(w &\cdot F(y,x)) = \mathop{\max_{i\le j \le m }}\delta_n(j) \\ & y_n^* = \mathop{\arg\max_{i\le j \le m }}\delta_n(j) & \end{aligned} ymax​(w​⋅F(y,x))=i≤j≤mmax​δn​(j)yn∗​=argi≤j≤mmax​δn​(j)​​
(4)返回路径:
y i ∗ = Ψ i + 1 ( y i + 1 ∗ ) , i = n − 1 , n − 2 , . . . , 1 \begin{aligned} y_i^* = \Psi_{i+1}(y_{i+1}^*), i=n-1,n-2,...,1 \end{aligned} yi∗​=Ψi+1​(yi+1∗​),i=n−1,n−2,...,1​
求得最优路径: y ∗ = ( y 1 ∗ , y 2 ∗ , . . . , y n ∗ ) y^*=(y_1^*,y_2^*,...,y_n^*) y∗=(y1∗​,y2∗​,...,yn∗​)

延伸思考

LSTM,Bert已经可以胜任序列标注问题了,为什么还需要套上一层CRF?

上一篇:Pytorch grad


下一篇:oracle逐步学习总结之oracle分页查询(基础三)