学习数据挖掘工具中,下面使用4种工具来对同一个数据集进行研究。
数据描述:下面这些数据是15个同学选修课程情况,在课程大纲*有10门课程供学生选择,下面给出具体的选课情况,以ARFF数据文件保存,名称为TestStudenti.arff。我使用Apriori算法期望挖掘出学生选课的关联规则。
@relation test_studenti
@attribute Arbori_binari_de_cautare {TRUE, FALSE}
@attribute
Arbori_optimali {TRUE, FALSE}
@attribute Arbori_echilibrati_in_inaltime
{TRUE, FALSE}
@attribute Arbori_Splay {TRUE, FALSE}
@attribute
Arbori_rosu_negru {TRUE, FALSE}
@attribute Arbori_2_3 {TRUE,
FALSE}
@attribute Arbori_B {TRUE, FALSE}
@attribute Arbori_TRIE {TRUE,
FALSE}
@attribute Sortare_topologica {TRUE, FALSE}
@attribute
Algoritmul_Dijkstra {TRUE, FALSE}
@data
TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE
TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,FALSE,FALSE
FALSE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE
FALSE,TRUE,FALSE,FALSE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE
TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE
TRUE,FALSE,TRUE,FALSE,FALSE,TRUE,TRUE,TRUE,FALSE,FALSE
FALSE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE
TRUE,FALSE,TRUE,TRUE,TRUE,FALSE,TRUE,TRUE,TRUE,FALSE
FALSE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,TRUE,FALSE,FALSE
TRUE,FALSE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE,FALSE,TRUE
FALSE,FALSE,TRUE,FALSE,TRUE,FALSE,FALSE,TRUE,TRUE,TRUE
TRUE,FALSE,FALSE,TRUE,TRUE,TRUE,FALSE,TRUE,FALSE,TRUE
FALSE,TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE,FALSE,TRUE
TRUE,TRUE,TRUE,FALSE,FALSE,TRUE,FALSE,TRUE,FALSE,FALSE
TRUE,TRUE,FALSE,FALSE,TRUE,TRUE,FALSE,TRUE,FALSE,FALSE
(一) Weka 使用实例
在Apriori算法中,设置minSupprot=50%, 最小置信度 minimum confidence 也设置为50%。Weka配置路径为 Explore-》Openfile(TestStudenti.arff)->Associate 点击配置参数信息
在算法完成之后,我们得到以下结果:
Best rules found:
1. Sortare_topologica=FALSE 13 ==> Arbori_TRIE=TRUE 13 <conf:(1)>
lift:(1) lev:(0) [0] conv:(0)
2. Arbori_rosu_negru=TRUE 11 ==>
Arbori_TRIE=TRUE 11 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
3.
Arbori_optimali=TRUE 10 ==> Arbori_TRIE=TRUE 10 <conf:(1)> lift:(1)
lev:(0) [0] conv:(0)
4. Arbori_optimali=TRUE 10 ==>
Sortare_topologica=FALSE 10 <conf:(1)> lift:(1.15) lev:(0.09) [1]
conv:(1.33)
5. Arbori_echilibrati_in_inaltime=TRUE 10 ==>
Arbori_TRIE=TRUE 10 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
6.
Arbori_optimali=TRUE Sortare_topologica=FALSE 10 ==> Arbori_TRIE=TRUE 10
<conf:(1)> lift:(1) lev:(0) [0] conv:(0)
7. Arbori_optimali=TRUE
Arbori_TRIE=TRUE 10 ==> Sortare_topologica=FALSE 10 <conf:(1)>
lift:(1.15) lev:(0.09) [1] conv:(1.33)
8. Arbori_optimali=TRUE 10 ==>
Arbori_TRIE=TRUE Sortare_topologica=FALSE 10 <conf:(1)> lift:(1.15)
lev:(0.09) [1] conv:(1.33)
9. Arbori_binari_de_cautare=TRUE 9 ==>
Arbori_TRIE=TRUE 9 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
10.
Arbori_B=FALSE 9 ==> Arbori_TRIE=TRUE 9 <conf:(1)> lift:(1) lev:(0)
[0] conv:(0)
11. Arbori_rosu_negru=TRUE Sortare_topologica=FALSE 9 ==>
Arbori_TRIE=TRUE 9 <conf:(1)> lift:(1) lev:(0) [0] conv:(0)
12.
Arbori_TRIE=TRUE 15 ==> Sortare_topologica=FALSE 13 <conf:(0.87)>
lift:(1) lev:(0) [0] conv:(0.67)
13. Arbori_rosu_negru=TRUE 11 ==>
Sortare_topologica=FALSE 9 <conf:(0.82)> lift:(0.94) lev:(-0.04) [0]
conv:(0.49)
14. Arbori_rosu_negru=TRUE Arbori_TRIE=TRUE 11 ==>
Sortare_topologica=FALSE 9 <conf:(0.82)> lift:(0.94) lev:(-0.04) [0]
conv:(0.49)
15. Arbori_rosu_negru=TRUE 11 ==> Arbori_TRIE=TRUE
Sortare_topologica=FALSE 9 <conf:(0.82)> lift:(0.94) lev:(-0.04) [0]
conv:(0.49)
16. Sortare_topologica=FALSE 13 ==> Arbori_optimali=TRUE 10
<conf:(0.77)> lift:(1.15) lev:(0.09) [1] conv:(1.08)
17.
Arbori_TRIE=TRUE Sortare_topologica=FALSE 13 ==> Arbori_optimali=TRUE 10
<conf:(0.77)> lift:(1.15) lev:(0.09) [1] conv:(1.08)
18.
Sortare_topologica=FALSE 13 ==> Arbori_optimali=TRUE Arbori_TRIE=TRUE 10
<conf:(0.77)> lift:(1.15) lev:(0.09) [1] conv:(1.08)
19.
Arbori_TRIE=TRUE 15 ==> Arbori_rosu_negru=TRUE 11 <conf:(0.73)>
lift:(1) lev:(0) [0] conv:(0.8)
20. Sortare_topologica=FALSE 13 ==>
Arbori_rosu_negru=TRUE 9 <conf:(0.69)> lift:(0.94) lev:(-0.04) [0]
conv:(0.69)
分析第一条结果,我们可以得出关联规则: 如果一个学生不参加Sortare topologica 课程,那么他的一个趋向是肯定不会参加 Arbori TRIE课程。这条关联规则的置信度是100%,是非常可信的。
(二) Using Weka in my Javacode
展示Java代码,运行程序可以得到和上面一样的结果
import java.io.BufferedReader;
import java.io.FileReader;
import
java.io.IOException;
import weka.associations.Apriori;
import
weka.core.Instances;
public class Main{
public static void main(String[]
args) {
Instances data = null;
try {
BufferedReader reader
= new BufferedReader( new FileReader( "TestStudenti.arff" ) );
data
= new
Instances(reader);
reader.close();
data.setClassIndex(data.numAttributes()
- 1);
}
catch ( IOException e )
{
e.printStackTrace();
}
double deltaValue = 0.05;
double
lowerBoundMinSupportValue = 0.1;
double minMetricValue = 0.5;
int
numRulesValue = 20;
double upperBoundMinSupportValue = 1.0;
String
resultapriori;
Apriori apriori = new
Apriori();
apriori.setDelta(deltaValue);
apriori.setLowerBoundMinSupport(lowerBoundMinSupportValue);
apriori.setNumRules(numRulesValue);
apriori.setUpperBoundMinSupport(upperBoundMinSupportValue);
apriori.setMinMetric(minMetricValue);
try{
apriori.buildAssociations(
data );
}
catch ( Exception e )
{
e.printStackTrace();
}
resultapriori =
apriori.toString();
System.out.println(resultapriori);
}
}