为了完成nlp-beginner任务4,所以先复习一下CRF
按顺序看以下:
- 如何轻松愉快地理解条件随机场(CRF)?
- 统计学习方法第11章
- LSTM+CRF 解析(原理篇)
模型
条件随机场是由转移特征函数和状态特征函数构成的
参数化形式:
P
(
y
∣
x
)
=
1
Z
(
x
)
exp
(
∑
i
,
k
λ
k
t
k
(
y
y
−
1
,
y
i
,
x
,
i
)
+
∑
i
,
l
μ
l
s
l
(
y
i
,
x
,
i
)
)
Z
(
x
)
=
∑
y
exp
(
∑
i
,
k
λ
k
t
k
(
y
y
−
1
,
y
i
,
x
,
i
)
+
∑
i
,
l
μ
l
s
l
(
y
i
,
x
,
i
)
)
\begin{aligned} P(y|x)=&\frac{1}{Z(x)}\mathop{\exp}\left(\sum_{i,k}\lambda_k t_k(y_{y-1},y_i,x,i)+\sum_{i,l}\mu_ls_l(y_i,x,i) \right) \\ Z(x) =& \sum_y \mathop{\exp}\left(\sum_{i,k}\lambda_k t_k(y_{y-1},y_i,x,i)+\sum_{i,l}\mu_ls_l(y_i,x,i) \right) & \end{aligned}
P(y∣x)=Z(x)=Z(x)1exp⎝⎛i,k∑λktk(yy−1,yi,x,i)+i,l∑μlsl(yi,x,i)⎠⎞y∑exp⎝⎛i,k∑λktk(yy−1,yi,x,i)+i,l∑μlsl(yi,x,i)⎠⎞
简化形式:
f
k
(
y
i
−
1
,
y
i
,
x
,
i
)
=
{
t
k
(
y
y
−
1
,
y
i
,
x
,
i
)
,
k
=
1
,
2
,
.
.
.
,
K
1
s
l
(
y
i
,
x
,
i
)
,
k
=
K
1
+
l
;
l
=
1
,
2
,
.
.
.
,
K
2
f
k
(
y
,
x
)
=
∑
i
=
1
n
f
k
(
y
i
−
1
,
y
i
,
x
,
i
)
,
k
=
1
,
2
,
.
.
.
,
K
w
k
=
{
λ
k
,
k
=
1
,
2
,
.
.
.
,
K
1
μ
l
,
k
=
K
1
+
l
;
l
=
1
,
2
,
.
.
.
,
K
2
P
(
y
∣
x
)
=
1
Z
(
x
)
exp
∑
k
=
1
K
w
k
f
k
(
y
,
x
)
若
w
和
F
(
一
个
K
个
特
征
函
数
组
成
的
向
量
)
为
向
量
形
式
:
P
w
(
y
∣
x
)
=
exp
(
w
⋅
F
(
y
,
x
)
)
Z
w
(
x
)
\begin{aligned} & f_k(y_{i-1},y_i,x,i)=\begin{cases} t_k(y_{y-1},y_i,x,i), &k =1,2,...,K_1\\ s_l(y_i,x,i),&k=K_1+l;l=1,2,...,K_2 \end{cases} \\ & f_k(y,x)=\sum_{i=1}^nf_k(y_{i-1},y_i,x,i),k=1,2,...,K \\ & w_k=\begin{cases} \lambda_k,k=1,2,...,K_1\\ \mu_l,k=K_1+l;l=1,2,...,K_2 \end{cases} \\ & P(y|x)=\frac{1}{Z(x)}\mathop{\exp}\sum_{k=1}^Kw_kf_k(y,x) \\ & 若w和F(一个K个特征函数组成的向量)为向量形式:\\ & P_w(y|x)=\frac{\mathop{\exp}(w \cdot F(y,x))}{Z_w(x)} \end{aligned}
fk(yi−1,yi,x,i)={tk(yy−1,yi,x,i),sl(yi,x,i),k=1,2,...,K1k=K1+l;l=1,2,...,K2fk(y,x)=i=1∑nfk(yi−1,yi,x,i),k=1,2,...,Kwk={λk,k=1,2,...,K1μl,k=K1+l;l=1,2,...,K2P(y∣x)=Z(x)1expk=1∑Kwkfk(y,x)若w和F(一个K个特征函数组成的向量)为向量形式:Pw(y∣x)=Zw(x)exp(w⋅F(y,x))
学习策略
y
∗
=
arg
max
y
P
(
y
∣
x
)
=
arg
max
y
exp
(
w
⋅
F
(
y
,
x
)
)
Z
w
(
x
)
=
arg
max
y
exp
(
w
⋅
F
(
y
,
x
)
)
=
arg
max
y
(
w
⋅
F
(
y
,
x
)
)
\begin{aligned} y^* &= \mathop{\arg\max_{y}}P(y|x) \\ &=\mathop{\arg\max_{y}}\frac{\mathop{\exp}(w \cdot F(y,x))}{Z_w(x)}\\ &=\mathop{\arg\max_{y}\mathop{\exp}(w \cdot F(y,x))}\\ &=\mathop{\arg\max_{y}}(w \cdot F(y,x))\\ \end{aligned}
y∗=argymaxP(y∣x)=argymaxZw(x)exp(w⋅F(y,x))=argymaxexp(w⋅F(y,x))=argymax(w⋅F(y,x))
条件随机场的预测问题成为求非规范化概率最大的最优路径问题:
max
y
(
w
⋅
F
(
y
,
x
)
)
\begin{aligned} \mathop{\max_{y}}(w \cdot F(y,x)) \end{aligned}
ymax(w⋅F(y,x))
为了求解最优路径,讲上式写成如下形式:
max
y
∑
i
=
1
n
w
⋅
F
i
(
y
i
−
1
,
y
i
,
x
)
\begin{aligned} \mathop{\max_{y}}\sum_{i=1}^nw \cdot F_i(y_{i-1},y_i,x) \end{aligned}
ymaxi=1∑nw⋅Fi(yi−1,yi,x)
预测算法
输入:模型特征向量
F
(
y
,
x
)
F(y,x)
F(y,x) 和权值向量
w
w
w,观测序列
x
=
(
x
1
,
x
2
,
.
.
.
,
x
n
)
x = (x_1,x_2,...,x_n)
x=(x1,x2,...,xn)
输出:最优路径
y
∗
=
(
y
1
∗
,
y
2
∗
,
.
.
.
,
y
n
∗
)
y^*=(y_1^*,y_2^*,...,y_n^*)
y∗=(y1∗,y2∗,...,yn∗)
(1)初试化:
δ
1
(
j
)
=
w
⋅
F
1
(
y
0
=
s
t
a
r
t
,
y
1
=
j
,
x
)
,
j
=
1
,
2
,
.
.
.
,
m
\begin{aligned} \delta_1(j)=w \cdot F_1(y_0=start,y_1=j,x),j=1,2,...,m \end{aligned}
δ1(j)=w⋅F1(y0=start,y1=j,x),j=1,2,...,m
(2)递推:对
i
=
2
,
3
,
.
.
.
,
n
i = 2, 3,...,n
i=2,3,...,n
δ
1
(
j
)
=
max
i
≤
j
≤
m
{
δ
i
−
1
+
w
⋅
F
i
(
y
i
−
1
=
j
,
y
i
=
l
,
x
)
}
,
j
=
1
,
2
,
.
.
.
,
m
Ψ
i
(
l
)
=
arg
max
1
≤
j
≤
m
{
δ
i
−
1
+
w
⋅
F
i
(
y
i
−
1
=
j
,
y
i
=
l
,
x
)
}
,
j
=
1
,
2
,
.
.
.
,
m
\begin{aligned} & \delta_1(j)= \mathop{\max_{i\le j \le m }} \{\delta_{i-1}+w \cdot F_i(y_{i-1}=j,y_i=l,x) \},j=1,2,...,m \\ & \Psi_i(l)=\mathop{\arg\max_{1 \le j \le m}}\{\delta_{i-1}+w \cdot F_i(y_{i-1}=j,y_i=l,x) \},j=1,2,...,m \end{aligned}
δ1(j)=i≤j≤mmax{δi−1+w⋅Fi(yi−1=j,yi=l,x)},j=1,2,...,mΨi(l)=arg1≤j≤mmax{δi−1+w⋅Fi(yi−1=j,yi=l,x)},j=1,2,...,m
(3)终止:
max
y
(
w
⋅
F
(
y
,
x
)
)
=
max
i
≤
j
≤
m
δ
n
(
j
)
y
n
∗
=
arg
max
i
≤
j
≤
m
δ
n
(
j
)
\begin{aligned} \mathop{\max_{y}}(w &\cdot F(y,x)) = \mathop{\max_{i\le j \le m }}\delta_n(j) \\ & y_n^* = \mathop{\arg\max_{i\le j \le m }}\delta_n(j) & \end{aligned}
ymax(w⋅F(y,x))=i≤j≤mmaxδn(j)yn∗=argi≤j≤mmaxδn(j)
(4)返回路径:
y
i
∗
=
Ψ
i
+
1
(
y
i
+
1
∗
)
,
i
=
n
−
1
,
n
−
2
,
.
.
.
,
1
\begin{aligned} y_i^* = \Psi_{i+1}(y_{i+1}^*), i=n-1,n-2,...,1 \end{aligned}
yi∗=Ψi+1(yi+1∗),i=n−1,n−2,...,1
求得最优路径:
y
∗
=
(
y
1
∗
,
y
2
∗
,
.
.
.
,
y
n
∗
)
y^*=(y_1^*,y_2^*,...,y_n^*)
y∗=(y1∗,y2∗,...,yn∗)
延伸思考
LSTM,Bert已经可以胜任序列标注问题了,为什么还需要套上一层CRF?