贝叶斯网络R语言学习笔记1
2021年7月19日19:54:12
一、创建贝叶斯网络结构
1、创建空网络
贝叶斯网络的图结构存储在bn对象中,可以通过三种表示来创建bn对象,即the arc set of the graph, its adjacency matrix or a model formula(边集、邻接矩阵、模型公式)。此外,可以通过empty.graph()
和random.graph()
函数创建空网络结构或随机网络结构。
# 产生空网络
library(bnlearn)
e = empty.graph(LETTERS[1:6])
e
class(e)
# 结果
Random/Generated Bayesian network
model:
[A][B][C][D][E][F]
nodes: 6
arcs: 0
undirected arcs: 0
directed arcs: 0
average markov blanket size: 0.00
average neighbourhood size: 0.00
average branching factor: 0.00
generation algorithm: Empty
> class(e)
[1] "bn"
通常empty.graph()默认只产生一个bn
对象,可以指定参数num,使之产生一批网络对象。
empty.graph(LETTERS[1:6], num = 2) #此处指定num = 2,产生两个bn对象
2、创建网络结构
①特定的边集
要产生边集,首先需要指定一个边集的矩阵。以下代码产生一个矩阵,两列,按行填充,行名为from,to。
arc.set = matrix(c("A", "C", "B", "F", "C", "F"),
ncol = 2, byrow = TRUE,
dimnames = list(NULL, c("from", "to")))
arc.set
from to
[1,] "A" "C"
[2,] "B" "F"
[3,] "C" "F"
将产生的边集arc.set传给bn
对象使用arcs()
函数
arcs(e) = arc.set
e
Random/Generated Bayesian network
model:
[A][B][D][E][C|A][F|B:C]
nodes: 6
arcs: 3
undirected arcs: 0
directed arcs: 3
average markov blanket size: 1.33
average neighbourhood size: 1.00
average branching factor: 0.50
generation algorithm: Empty
对于arcs()
函数,其会对边集做一定的检查,来看是否符合上下条件。
bogus = matrix(c("X", "Y", "W", "Z"),
ncol = 2, byrow = TRUE,
dimnames = list(NULL, c("from", "to")))
bogus
from to
[1,] "X" "Y"
[2,] "W" "Z"
将bogus
传给bn
对象e
,会出现报错
arcs(e) = bogus
Error in check.arcs(value, nodes = names(x$nodes)) :
node(s) 'X' 'W' 'Y' 'Z' not present in the graph.
cycle = matrix(c("A", "C", "C", "B", "B", "A"),
ncol = 2, byrow = TRUE,
dimnames = list(NULL, c("from", "to")))
cycle
from to
[1,] "A" "C"
[2,] "C" "B"
[3,] "B" "A"
arcs(e) = cycle
Error in `arcs<-`(`*tmp*`, value = c("A", "C", "B", "C", "B", "A")) :
the specified network contains cycles.
arcs(e,ignore.cycles = TRUE) = cycle
Error in `arcs<-`(`*tmp*`, ignore.cycles = TRUE, value = c("A", "C", "B", :
参数没有用(ignore.cycles = TRUE)
> acyclic(e) # check whether the graph is acyclic/completely directed.检测是否无环
[1] TRUE
> loops = matrix(c("A", "A", "B", "B", "C", "D"),
- ncol = 2, byrow = TRUE,
- dimnames = list(NULL, c("from", "to")))
> loops
from to
[1,] "A" "A"
[2,] "B" "B"
[3,] "C" "D"
> arcs(e) = loops
Error in check.arcs(value, nodes = names(x$nodes)) :
invalid arcs that are actually loops:
A -> A
B -> B
> edges = matrix(c("A", "B", "B", "A", "C", "D"),
+ ncol = 2, byrow = TRUE,
+ dimnames = list(NULL, c("from", "to")))
> edges
from to
[1,] "A" "B"
[2,] "B" "A"
[3,] "C" "D"
> arcs(e) = edges
> e
Random/Generated Bayesian network
model:
[partially directed graph]
nodes: 6
arcs: 2
undirected arcs: 1 #这里有个非直接环弧
directed arcs: 1
average markov blanket size: 0.67
average neighbourhood size: 0.67
average branching factor: 0.17
generation algorithm: Empty
②特定的的邻接矩阵
使用amat()
函数可以用矩阵来产生图结构。
我们首先需要做的是产生一个矩阵,矩阵的元素为0/1整数(注意这里使用0L
和1L
,0和1也是可以的,但是会占更多的内存,这在大数据集尤为重要),1L就代表存在一条弧。
> adj = matrix(0L, ncol = 6, nrow = 6,
+ dimnames = list(LETTERS[1:6], LETTERS[1:6]))
> adj
A B C D E F
A 0 0 0 0 0 0
B 0 0 0 0 0 0
C 0 0 0 0 0 0
D 0 0 0 0 0 0
E 0 0 0 0 0 0
F 0 0 0 0 0 0
> adj["A", "C"] = 1L
> adj["B", "F"] = 1L
> adj["C", "F"] = 1L
> adj["D", "E"] = 1L
> adj["A", "E"] = 1L
> adj
A B C D E F
A 0 0 1 0 1 0
B 0 0 0 0 0 1
C 0 0 0 0 0 1
D 0 0 0 0 1 0
E 0 0 0 0 0 0
F 0 0 0 0 0 0
接着使用amat()
函数将弧(arcs)引入到图中
> amat(e) = adj
> e
Random/Generated Bayesian network
model:
[A][B][D][C|A][E|A:D][F|B:C]
nodes: 6
arcs: 5
undirected arcs: 0
directed arcs: 5
average markov blanket size: 2.33
average neighbourhood size: 1.67
average branching factor: 0.83
generation algorithm: Empty
注意:amat()
函数和arcs()
函数一样,都会检查错误的图结构
③特定的模型公式
模型公式的格式来源于deal包,定义如下
- 每个节点在方括号中
- 每个节点的父节点在列在管道符后(级竖杆符号
|
),父节点间用冒号:
隔开,但是都是在一个方括号中
可以采用两种方式来定义图结构
使用model2network()
函数,不用先产生空图
> model2network("[A][C][B|A][D|C][F|A:B:C][E|F]")
Random/Generated Bayesian network
model:
[A][C][B|A][D|C][F|A:B:C][E|F]
nodes: 6
arcs: 6
undirected arcs: 0
directed arcs: 6
average markov blanket size: 2.67
average neighbourhood size: 2.00
average branching factor: 1.00
generation algorithm: Empty
使用modelstring()
函数和一个已存在的bn
对象,类似之前的arcs()
和amat()
> modelstring(e) = "[A][C][B|A][D|C][F|A:B:C][E|F]"
> e
Random/Generated Bayesian network
model:
[A][C][B|A][D|C][F|A:B:C][E|F]
nodes: 6
arcs: 6
undirected arcs: 0
directed arcs: 6
average markov blanket size: 2.67
average neighbourhood size: 2.00
average branching factor: 1.00
generation algorithm: Empty
注意:model2network()
和modelstring()
函数同amat()
函数、arcs()
函数一样,都会检查错误的图结构。
3、创建一个或多个随机结构
①特定节点顺序
默认情况下,random.graph()
函数产生的图和提供的节点顺序一致;在每个弧中,尾部节点排在头部节点的前面。弧是独立的样本,其包含概率由prob
参数指定,prob
参数的默认值设置为使图中平均有与节点一样多的弧。
> random.graph(LETTERS[1:6], prob = 0.1) #顺序是从A到F
Random/Generated Bayesian network
model:
[A][B][D][E][C|B][F|C] #这里可以看出,[B]是排在[C|B]前面的
nodes: 6
arcs: 2
undirected arcs: 0
directed arcs: 2
average markov blanket size: 0.67
average neighbourhood size: 0.67
average branching factor: 0.33
generation algorithm: Full Ordering
arc sampling probability: 0.1
和empty.graph()
函数一样,我们也可以指定num
参数来产生一批图,这样就会产生一个列表,可以使用lapply()
来很好地批量计算。
②以均匀概率从连通的有向无环图空间中抽样
除了使用默认的方法method = "ordered"
,还可以使用method = "ic-dag"
来从具有均匀概率的连通的有向无环图空间中抽样,使用 Ide & Cozman MCMC sampler
请注意,建议的老化长度(length of the burn-in scales ?)与节点数量成二次函数关系,因此使用num
批量生成多个图要比一次生成一个图高效得多。
> random.graph(LETTERS[1:6], num = 2, method = "ic-dag")
[[1]]
Random/Generated Bayesian network
model:
[C][E][A|E][B|A:E][D|B:E][F|A:C:D]
nodes: 6
arcs: 8
undirected arcs: 0
directed arcs: 8
average markov blanket size: 3.67
average neighbourhood size: 2.67
average branching factor: 1.33
generation algorithm: Ide & Cozman's Multiconnected DAGs
burn in length: 216
maximum in-degree: Inf
maximum out-degree: Inf
maximum degree: Inf
[[2]]
Random/Generated Bayesian network
model:
[E][A|E][B|A:E][C|A][D|B:E][F|A:C:D]
nodes: 6
arcs: 9
undirected arcs: 0
directed arcs: 9
average markov blanket size: 3.67
average neighbourhood size: 3.00
average branching factor: 1.50
generation algorithm: Ide & Cozman's Multiconnected DAGs
burn in length: 216
maximum in-degree: Inf
maximum out-degree: Inf
maximum degree: Inf
此方法对生成的图的结构接受几个可选的约束,如下面的参数所指定:
- burn.in: the number of iterations for the algorithm to converge to a stationary (and uniform) probability distribution.收敛迭代的次数
- every: return only one graph every number of steps instead of all the generated graphs. High values of every result in a more diverse set.每个步骤只返回一个图,而不是所有生成的图。every 的高值会导致更多样化的集合。
- max.degree: the maximum degree of a node.
- max.in.degree: the maximum in-degree.
- max.out.degree: the maximum out-degree.
举例
random.graph(LETTERS[1:6], num = 2, method = "ic-dag", burn.in = 10^4,
+ every = 50, max.degree = 3)
[[1]]
Random/Generated Bayesian network
model:
[B][F|B][A|F][C|A:F][D|A:B][E|B:C]
nodes: 6
arcs: 8
undirected arcs: 0
directed arcs: 8
average markov blanket size: 3.33
average neighbourhood size: 2.67
average branching factor: 1.33
generation algorithm: Ide & Cozman's Multiconnected DAGs
burn in length: 10000
maximum in-degree: Inf
maximum out-degree: Inf
maximum degree: 3
[[2]]
Random/Generated Bayesian network
model:
[A][C|A][F|C][D|F][B|D:F][E|B:C]
nodes: 6
arcs: 7
undirected arcs: 0
directed arcs: 7
average markov blanket size: 2.67
average neighbourhood size: 2.33
average branching factor: 1.17
generation algorithm: Ide & Cozman's Multiconnected DAGs
burn in length: 10000
maximum in-degree: Inf
maximum out-degree: Inf
maximum degree: 3
③以均匀概率从有向无环图空间抽样
Melançon的 MCMC算法以均匀概率从有向无环图空间抽样(不需要连接not necessarily connected),指定method = "melancon"
即可。
> random.graph(LETTERS[1:6], method = "melancon")
Random/Generated Bayesian network
model:
[A][B|A][F|A][C|A:B][E|B][D|B:C:E]
nodes: 6
arcs: 8
undirected arcs: 0
directed arcs: 8
average markov blanket size: 3.00
average neighbourhood size: 2.67
average branching factor: 1.33
generation algorithm: Melancon's Uniform Probability DAGs
burn in length: 216
maximum in-degree: Inf
maximum out-degree: Inf
maximum degree: Inf
至于可选的参数,它与Ide & Cozman算法是相同的。
4、总结
- 创建三种网络:空网络、特定网络、随机网络
- 创建特定网络网络的三种方法:边集、邻接矩阵、模型公式
- 存在的问题:对随机网络不太懂,为什么要创建随机网络,是给我们看看有这些节点可以产生什么样的结构吗?
- 随机网络的取样是啥意思?