RS的Hard Negative与False Negative
负采样肯定是要采信息量大的Hard Negative样本,但是在RS场景中,unlabelled 样本并不等价于负样本,可能是还没有曝光给用户的样本。所以Hard Negative样本有可能是False Negative。所以过采样Hard Negative有可能引入False Negative。
Simplify and Robustify Negative Sampling for Implicit Collaborative Filtering (NIPS 2020)
《Simplify and Robustify Negative Sampling》观察到虽然False Negative和Hard Negative都会有较大的Socre,但是False Negative有更低的预测方差。所以提出一个Simplify and Robustify Negative Sampling方法,在训练epoch
t
t
t 时,根据前5个epoch的训练记录,预测评分高、方差大的样本作为Hard Negative:
j
=
arg
max
k
∈
M
u
P
p
o
s
(
k
∣
u
,
i
)
+
α
t
⋅
std
[
P
p
o
s
(
k
∣
u
,
i
)
]
P
(
j
∣
u
,
i
)
=
sigmoid
(
r
u
i
−
r
u
j
)
std
[
P
pos
(
k
∣
u
,
i
)
]
=
∑
s
=
t
−
5
t
−
1
[
[
P
pos
(
k
∣
u
,
i
)
]
s
−
Mean
[
P
pos
(
k
∣
u
,
i
)
]
2
/
5
Mean
[
P
pos
(
k
∣
u
,
i
)
]
=
∑
s
=
t
−
5
t
−
1
[
P
pos
(
k
∣
u
,
i
)
]
s
/
5
\begin{aligned} j &=\arg \max _{k \in \mathcal{M}_{u}} P_{\mathrm{pos}}(k \mid u, i)+\alpha_{t} \cdot \operatorname{std}\left[P_{\mathrm{pos}}(k \mid u, i)\right] \\ P(j \mid u, i) &=\operatorname{sigmoid}\left(r_{u i}-r_{u j}\right) \\ \operatorname{std}\left[P_{\text {pos }}(k \mid u, i)\right] &=\sqrt{\sum_{s=t-5}^{t-1}\left[\left[P_{\text {pos }}(k \mid u, i)\right]_{s}-\operatorname{Mean}\left[P_{\text {pos }}(k \mid u, i)\right]^{2} / 5\right.} \\ \operatorname{Mean}\left[P_{\text {pos }}(k \mid u, i)\right] &=\sum_{s=t-5}^{t-1}\left[P_{\text {pos }}(k \mid u, i)\right]_{s} / 5 \end{aligned}
jP(j∣u,i)std[Ppos (k∣u,i)]Mean[Ppos (k∣u,i)]=argk∈MumaxPpos(k∣u,i)+αt⋅std[Ppos(k∣u,i)]=sigmoid(rui−ruj)=s=t−5∑t−1[[Ppos (k∣u,i)]s−Mean[Ppos (k∣u,i)]2/5
=s=t−5∑t−1[Ppos (k∣u,i)]s/5
一些Hard Negative Sampling 方法
以下是MixGCF的baseline:
- RNS:Random negative sampling (RNS),随机选一些负样本,LightGCN等的做法。
- DNS:Dynamic negative sampling (DNS),动态选取当前模型预测分数高的样本,但为了避免False Negative,选取预测分数前10%~20%的。
- IRGAN:把推荐系统和GAN结合起来,G搞成一个sampler to pick negative for confusing the recommender。
- AdvIR:基于IRGAN的工作。
- MCNS:Markov chain Monte Carlo negative sampling (MCNS)
后三种方法都比较费时。
MixGCF: An Improved Training Method for Graph Neural Network-based Recommender Systems (KDD 2021)
MixGCF参考了mixup的做法,并且根据GCN的特点引入了hop mixing:
其中
e
v
x
l
e_{v_x}^{l}
evxl 是DNS得到的若干个Hard Negative在
l
l
l 层的表征,positive mixing是在Negative的表征中混入一些positive的表征,这样人为构造Hard Negative,这样得到了
(
l
+
1
)
×
m
(l+1) \times m
(l+1)×m 个表征;在hop mixing中,每层从
m
m
m 个表征中选一个最难的:
e
v
x
′
(
l
)
=
arg
max
e
v
m
′
(
l
)
∈
E
(
l
)
f
Q
(
u
,
l
)
⋅
e
v
m
(
l
)
\mathrm{e}_{v_{x}}^{\prime(l)}=\underset{\mathrm{e}_{v_{m}}^{\prime(l)} \in \mathcal{E}^{(l)}}{\arg \max } f_{\mathrm{Q}}(u, l) \cdot \mathrm{e}_{v_{m}}^{(l)}
evx′(l)=evm′(l)∈E(l)argmaxfQ(u,l)⋅evm(l)
f
Q
(
u
,
l
)
f_{\mathrm{Q}}(u, l)
fQ(u,l) 是 u 在 l 层的表征。最后再将
l
l
l 层的negative表征pooling起来。