记录知识追踪领域所看的论文
本文研究内容
采用FM因子分解机,利用特征之间的组合特性,预测习题是否被正确做出。主要在于如何构造特征, User,items,skills,是id类feature的one-hot(skills是向量),分别是用户id,习题id,习题对应知识点id向量,Wins和Fails变量的设置是核心所在,分别表示做对和做错某题对应知识点的计数器,但仅在该时刻习题知识点与历史做题记录共同的知识点上进行累加。
作者另一篇论文-待看
Deep Factorization Machines for Knowledge Tracing
相关工作
BKT的缺点
无法为含有多个知识组件的问题建模
BKT最新模型feature aware student tracing (FAST) 可以解决该问题,即同时为多个知识组件建模
DKT
Wilson, K. H.; Karklin, Y.; Han, B.; and Ekanadham, C.
2016a. Back to the basics: Bayesian extensions of IRT outperform neural networks for proficiency estimation. In Proceedings of the 9th International Conference on Educational
Data Mining (EDM), 539–544.
该研究文献表明一些因子分析模型可以与DKT的性能相当
Factor Analysis
因子分析是从假设出发,它假设所有的自变量x出现的原因是因为背后存在一个潜变量f,也就是我们所说的因子,在这个因子的作用下,x可以被观察到。
举个例子,比如一个学生考试,数学,化学 ,物理都考了满分,那么我们认为这个学生理性思维较强,理性思维就是一个因子。在这个因子的作用下,偏理科的成绩才会那么高。这就是因子分析。
因子分析:
-
探索性因子分析(exploratory factor analysis)
不确定一堆自变量背后有几个因子,我们通过这种方法试图寻找到这几个因子。 -
验证性因子分析(confirmatory factor analysis)
已经假设自变量背后有几个因子,试图通过这种方法去验证一下这种假设是否正确。
Item Response Theory
代表模型 :Rasch model
θ
i
θ_i
θi:the ability of student i (the student bias)
d
j
d_j
dj: the difficulty of question j(the question bias)
Wilson, K. H.; Xiong, X.; Khajah, M.; Lindsey, R. V.; Zhao,
S.; Karklin, Y.; Van Inwegen, E. G.; Han, B.; Ekanadham,
C.; Beck, J. E.; et al. 2016b. Estimating student proficiency:
Deep learning is not the panacea. Presented at the Workshop
on Machine Learning for Education, Neural Information Processing Systems.
该研究文献表明没有时间特征,IRT也可以优于DKT:可能是因为DKT太多参数,容易过拟合
Multidimensional Item Response Theory (MIRT)
θ
i
θ_i
θi : the multidimensional ability of student i,
d
j
d_j
dj : the multidimensional discrimination of item j
δ
j
δ_j
δj : the easiness of item j (item bias)
Additive factor model (AFM)
takes into account the number of attempts a learner has made to an item
β
k
β_k
βk : the bias for skill k
γ
k
γ_k
γk: the bias for each opportunity of learning skill k
N
i
k
N_{ik}
Nik :the number of times student i attempted a question that requires skill k
AFM 待看论文
Cen, H.; Koedinger, K.; and Junker, B. 2006. Learning factors analysis–a general method for cognitive model evaluation
and improvement. In International Conference on Intelligent
Tutoring Systems, 164–175. Springer.
Cen, H.; Koedinger, K.; and Junker, B. 2008. Comparing two
irt models for conjunctive skills. In International Conference
on Intelligent Tutoring Systems, 796–798. Springer
Performance factor analysis model (PFA)
counts separately positive and negative attempts
β
k
β_k
βk : the bias for skill k,
γ
k
(
δ
k
)
γ_k (δ_k)
γk(δk): the bias for each opportunity of learning skill k after a successful (unsuccessful)
attempt
W
i
k
(
F
i
k
)
W_{ik} (F_{ik})
Wik(Fik) : the number of successes (failures) of student i over a question that requires skill k
PFA待看论文
Pavlik, P. I.; Cen, H.; and Koedinger, K. R. 2009. Performance factors analysis–a new alternative to knowledge tracing. In Proceedings of the 2009 conference on Artificial Intelligence in Education: Building Learning Systems that Care:
From Knowledge Representation to Affective Modelling, 531–538. IOS Press.
AFM其实就是PFA的一个特例,即 γ k = δ k γ_k = δ_k γk=δk时的情况
Wilson, K. H.; Xiong, X.; Khajah, M.; Lindsey, R. V.; Zhao,
S.; Karklin, Y.; Van Inwegen, E. G.; Han, B.; Ekanadham,
C.; Beck, J. E.; et al. 2016b. Estimating student proficiency:
Deep learning is not the panacea. Presented at the Workshop
on Machine Learning for Education, Neural Information Processing Systems.
Xiong, X.; Zhao, S.; Inwegen, E. V.; and Beck, J. 2016. Going deeper with deep knowledge tracing. In Proceedings of
the 9th International Conference on Educational Data Mining (EDM), 545–550.
上述文献表明DKT和PFA性能相当
Factorization Machines
Thai-Nghe, N.; Drumond, L.; Horv´ath, T.; and SchmidtThieme, L. 2012. Using factorization machines for student
modeling. In Proceedings of FactMod 2012 at the 20th Conference on User Modeling, Adaptation, and Personalization
(UMAP 2012)
Sweeney, M.; Lester, J.; Rangwala, H.; and Johri, A. 2016.
Next-term student performance prediction: A recommender
systems approach. JEDM — Journal of Educational Data
Mining 8(1):22–51.
上述两篇文献采用FM处理学生建模中的回归问题
本文采用FM处理学生建模中的分类问题
Knowledge Tracing Machines
KTMs基于事件中涉及的所有特征的稀疏权重集,对事件(对或错)的二进制结果进行了建模。事件中涉及的特征由长度为N的稀疏向量x编码,仅当该事件涉及特征1≤i≤N时,使得
x
i
x_i
xi>0。对于涉及x的每个事件,观察到正结果的概率p(X)验证为:
µ : a global bias
w : refer to the vector of biases
(
w
1
,
.
.
.
,
w
N
)
(w_1, . . . , w_N )
(w1,...,wN)
V :refer to the matrix of embeddings
v
i
,
i
=
1
,
.
.
.
,
N
v_i, i = 1, . . . , N
vi,i=1,...,N
每个特征i都是由一个偏置
w
i
w_i
wi∈R和一个嵌入
v
i
∈
R
d
v_i∈R^d
vi∈Rd对某一维d建模的。
特征feature
- Users:n个学生,用n个特征描述,1<=i<=n,当学生i参与观察时, x i x_i xi=1,其他设置为0(one-hot vector)
- Items:m个问题,用m+个特征描述,1<=j<=m,当问题j参与观察时, x j x_j xj=1,其他设置为0
- Skills:s个技能,用s个特征描述,一个学生对问题j的观察所涉及的技能称为KC(j)
- Attempts:s个特征,计数器,为一个学生在测试中进行多少次尝试才能获得一项技能计数
- Wins and Fails:s个特征成功获得一项技能,则尝试为correct;反之,为incorrect
- Extra side information:学生的school id,teacher id,以及参加的考试为low stake或high stake
基于以上特征的数据进行编码(此时特征长度N=m+n+3s)
举例说明:第一轮,User2回答问题item2,outcome为1,从而获得skills1和2
第二轮,User2回答问题item2, outcome为0,之前的操作使得这一轮的Wins1和2为1
第三轮,User2回答问题item2, outcome为1,之前的操作使得这一轮的Wins1和2为1,Fails1和2为0
如此进行编码 …
Relation to Existing Models
ψ = logit,d = 0,only biases are learned for features, no embeddings
-
Relation to IRT
encode the pair (student i, question j)
当k=i或k=n+j时, x k x_k xk=1
假设n个student features have bias: w i = θ i − µ w_i = θ_i - µ wi=θi−µ,m个question features have bias - d j d_j dj ,KTM就会变成 the 1-PL IRT model,也就是Rasch model 。此时 w = ( θ 1 − µ , . . . , θ n − µ , − d 1 , . . . , − d m ) w = (θ_1 - µ, . . . , θ_n -µ, -d_1, . . . , -d_m) w=(θ1−µ,...,θn−µ,−d1,...,−dm)
-
Relation to AFM and PFA
encoding of skills, wins and fails at skill level
( q j k ) 1 ≤ j ≤ m , 1 ≤ k ≤ s (q_{jk})1≤j≤m,1≤k≤s (qjk)1≤j≤m,1≤k≤s:question和skill之间的二进制映射
假设 w = ( β 1 , . . . , β s , γ 1 , . . . , γ s , δ 1 , . . . , δ s ) w = (β_1, . . . , β_s, γ_1, . . . , γ_s, δ_1, . . . , δ_s) w=(β1,...,βs,γ1,...,γs,δ1,...,δs),encoding of “student i attempted question j” is given by x = ( q j 1 , . . . , q j s , q j 1 W i 1 , . . . , q j s W i s , q j 1 F i 1 , . . . , q j s F i s ) x = (q_{j1}, . . . , q_{js}, q_{j1}W_{i1}, . . . , q_{js}W_{is}, q_{j1}F_{i1}, . . . , q_{js}F_{is}) x=(qj1,...,qjs,qj1Wi1,...,qjsWis,qj1Fi1,...,qjsFis),其中 W i k 和 F i k W_{ik}和F_{ik} Wik和Fik是skill水平上成功和失败尝试的计数器。此时,KTM就变成了PFA模型
-
Relation to MIRT
d > 0
the embeddings : V = ( θ 1 , . . . , θ n , d 1 , . . . , d m ) V = (θ_1, . . . , θ_n, d_1, . . . , d_m) V=(θ1,...,θn,d1,...,dm)
其他和IRT的一样
训练
通过最小化所有S观测样本的负对数似然NLL来训练KTMS
X
=
(
x
i
)
1
≤
i
≤
S
X = (x_i)1≤i≤S
X=(xi)1≤i≤S :样本特征
y
=
(
y
i
)
1
≤
i
≤
S
∈
0
,
1
S
y = (y_i)1≤i≤S ∈ {0, 1}^S
y=(yi)1≤i≤S∈0,1S:输出结果
为了知道训练,以及避免过度拟合,假设一些先验模型参数
bias $w_k :w_k ∼ N (µ, 1/λ) $
embedding component
v
k
f
,
f
=
1
,
.
.
.
,
d
:
v
k
f
∼
N
(
µ
,
1
/
λ
)
v_{kf }, f = 1, . . . , d : v_{kf} ∼ N (µ, 1/λ)
vkf,f=1,...,d:vkf∼N(µ,1/λ)
µ and λ :正则化参数, 遵循超先验 µ ∼ N (0, 1) and λ ∼ Γ(1, 1)
由于这些超先验,我们不需要手工调整正则化参数。 当我们使用ψ=probit,即正态分布的CDF逆,我们可以用吉布斯抽样来拟合模型。
The model is learned using the MCMC Gibbs sampler implementation of libFM2 in C++ , using the pywFM Python wrapper3
KTMs可以可视化ebeddings
数据集
- Temporal Datasets
Assistments
Assistments - Non-Temporal Datasets
Castor
ECPE
Fraction
TIMSS
实验结果
总结
本文介绍了KTM,对EDM领域的一些经典模型采用KTM处理知识追踪领域的分类问题。即使观测数据稀疏,它也可以估计用户和项目参数,并提供比现有模型更好的预测。
未来可研究的点
根据数据的收集方式改进KTM中特征的编码:
- Are the observations made at skill level or problem level?
- Does it make sense to count the number of attempts at item level or at skill level?
- What are extra sources of information that may raise better understanding of the observations?