贝叶斯网络R语言学习笔记1

贝叶斯网络R语言学习笔记1

2021年7月19日19:54:12

一、创建贝叶斯网络结构

1、创建空网络

贝叶斯网络的图结构存储在bn对象中,可以通过三种表示来创建bn对象,即the arc set of the graph, its adjacency matrix or a model formula(边集、邻接矩阵、模型公式)。此外,可以通过empty.graph()random.graph()函数创建网络结构或随机网络结构。

# 产生空网络
library(bnlearn)
e = empty.graph(LETTERS[1:6])
e
class(e)
# 结果
  Random/Generated Bayesian network

  model:
   [A][B][C][D][E][F] 
  nodes:                                 6 
  arcs:                                  0 
    undirected arcs:                     0 
    directed arcs:                       0 
  average markov blanket size:           0.00 
  average neighbourhood size:            0.00 
  average branching factor:              0.00 

  generation algorithm:                  Empty 
 
 > class(e)
[1] "bn"

通常empty.graph()默认只产生一个bn对象,可以指定参数num,使之产生一批网络对象。

empty.graph(LETTERS[1:6], num = 2) #此处指定num = 2,产生两个bn对象

2、创建网络结构

①特定的边集

要产生边集,首先需要指定一个边集的矩阵。以下代码产生一个矩阵,两列,按行填充,行名为from,to。

arc.set = matrix(c("A", "C", "B", "F", "C", "F"),
            	ncol = 2, byrow = TRUE,
          		dimnames = list(NULL, c("from", "to")))
arc.set

     from to 
[1,] "A"  "C"
[2,] "B"  "F"
[3,] "C"  "F"

将产生的边集arc.set传给bn对象使用arcs()函数

arcs(e) = arc.set
e
  Random/Generated Bayesian network

  model:
   [A][B][D][E][C|A][F|B:C] 
  nodes:                                 6 
  arcs:                                  3 
    undirected arcs:                     0 
    directed arcs:                       3 
  average markov blanket size:           1.33 
  average neighbourhood size:            1.00 
  average branching factor:              0.50 

  generation algorithm:                  Empty 

对于arcs()函数,其会对边集做一定的检查,来看是否符合上下条件。

  • 边的标签要和图里的标签一致
bogus = matrix(c("X", "Y", "W", "Z"),
               ncol = 2, byrow = TRUE,
               dimnames = list(NULL, c("from", "to")))
bogus
     from to 
[1,] "X"  "Y"
[2,] "W"  "Z"

bogus传给bn对象e,会出现报错

arcs(e) = bogus

Error in check.arcs(value, nodes = names(x$nodes)) : 
  node(s) 'X' 'W' 'Y' 'Z' not present in the graph.
  • 不能引入环(e.g. A → B → C → A) ,除非我们设置ignore.cycles = TURE
cycle = matrix(c("A", "C", "C", "B", "B", "A"),
               ncol = 2, byrow = TRUE,
               dimnames = list(NULL, c("from", "to")))
cycle

     from to 
[1,] "A"  "C"
[2,] "C"  "B"
[3,] "B"  "A"

arcs(e) = cycle

Error in `arcs<-`(`*tmp*`, value = c("A", "C", "B", "C", "B", "A")) : 
  the specified network contains cycles.

arcs(e,ignore.cycles = TRUE) = cycle

Error in `arcs<-`(`*tmp*`, ignore.cycles = TRUE, value = c("A", "C", "B",  : 
  参数没有用(ignore.cycles = TRUE)

> acyclic(e) # check whether the graph is acyclic/completely directed.检测是否无环
[1] TRUE
  • 不能引入圈 (e.g. A → A).
> loops = matrix(c("A", "A", "B", "B", "C", "D"),
-                ncol = 2, byrow = TRUE,
-                dimnames = list(NULL, c("from", "to")))
> loops
    from to 
[1,] "A"  "A"
[2,] "B"  "B"
[3,] "C"  "D"
> arcs(e) = loops
Error in check.arcs(value, nodes = names(x$nodes)) : 
 invalid arcs that are actually loops:
  A -> A 
  B -> B 
  • 可以通过在边集中包括双向边引入非直接边(e.g. A → B and B → A),保证双向均未引入环,设置ignore.cycles会覆盖这个检查。
> edges = matrix(c("A", "B", "B", "A", "C", "D"),
+                  ncol = 2, byrow = TRUE,
+                  dimnames = list(NULL, c("from", "to")))
> edges
     from to 
[1,] "A"  "B"
[2,] "B"  "A"
[3,] "C"  "D"
> arcs(e) = edges
> e
  Random/Generated Bayesian network

  model:
    [partially directed graph]
  nodes:                                 6 
  arcs:                                  2 
    undirected arcs:                     1 #这里有个非直接环弧 
    directed arcs:                       1 
  average markov blanket size:           0.67 
  average neighbourhood size:            0.67 
  average branching factor:              0.17 

  generation algorithm:                  Empty 

②特定的的邻接矩阵

使用amat()函数可以用矩阵来产生图结构。
我们首先需要做的是产生一个矩阵,矩阵的元素为0/1整数(注意这里使用0L1L,0和1也是可以的,但是会占更多的内存,这在大数据集尤为重要),1L就代表存在一条弧。

> adj = matrix(0L, ncol = 6, nrow = 6,
+              dimnames = list(LETTERS[1:6], LETTERS[1:6]))
> adj
  A B C D E F
A 0 0 0 0 0 0
B 0 0 0 0 0 0
C 0 0 0 0 0 0
D 0 0 0 0 0 0
E 0 0 0 0 0 0
F 0 0 0 0 0 0
> adj["A", "C"] = 1L
> adj["B", "F"] = 1L
> adj["C", "F"] = 1L
> adj["D", "E"] = 1L
> adj["A", "E"] = 1L
> adj
  A B C D E F
A 0 0 1 0 1 0
B 0 0 0 0 0 1
C 0 0 0 0 0 1
D 0 0 0 0 1 0
E 0 0 0 0 0 0
F 0 0 0 0 0 0

接着使用amat()函数将弧(arcs)引入到图中

> amat(e) = adj
> e

  Random/Generated Bayesian network

  model:
   [A][B][D][C|A][E|A:D][F|B:C] 
  nodes:                                 6 
  arcs:                                  5 
    undirected arcs:                     0 
    directed arcs:                       5 
  average markov blanket size:           2.33 
  average neighbourhood size:            1.67 
  average branching factor:              0.83 

  generation algorithm:                  Empty 

注意:amat()函数和arcs()函数一样,都会检查错误的图结构

③特定的模型公式

模型公式的格式来源于deal包,定义如下

  • 每个节点在方括号中
  • 每个节点的父节点在列在管道符后(级竖杆符号|),父节点间用冒号:隔开,但是都是在一个方括号中
    可以采用两种方式来定义图结构
使用model2network()函数,不用先产生空图
> model2network("[A][C][B|A][D|C][F|A:B:C][E|F]")

  Random/Generated Bayesian network

  model:
   [A][C][B|A][D|C][F|A:B:C][E|F] 
  nodes:                                 6 
  arcs:                                  6 
    undirected arcs:                     0 
    directed arcs:                       6 
  average markov blanket size:           2.67 
  average neighbourhood size:            2.00 
  average branching factor:              1.00 

  generation algorithm:                  Empty 
使用modelstring()函数和一个已存在的bn对象,类似之前的arcs()amat()
> modelstring(e) = "[A][C][B|A][D|C][F|A:B:C][E|F]"
> e

  Random/Generated Bayesian network

  model:
   [A][C][B|A][D|C][F|A:B:C][E|F] 
  nodes:                                 6 
  arcs:                                  6 
    undirected arcs:                     0 
    directed arcs:                       6 
  average markov blanket size:           2.67 
  average neighbourhood size:            2.00 
  average branching factor:              1.00 

  generation algorithm:                  Empty 

注意:model2network()modelstring()函数同amat()函数、arcs()函数一样,都会检查错误的图结构。

3、创建一个或多个随机结构

①特定节点顺序

默认情况下,random.graph()函数产生的图和提供的节点顺序一致;在每个弧中,尾部节点排在头部节点的前面。弧是独立的样本,其包含概率由prob参数指定,prob参数的默认值设置为使图中平均有与节点一样多的弧。

> random.graph(LETTERS[1:6], prob = 0.1) #顺序是从A到F

  Random/Generated Bayesian network

  model:
   [A][B][D][E][C|B][F|C]   #这里可以看出,[B]是排在[C|B]前面的
  nodes:                                 6 
  arcs:                                  2 
    undirected arcs:                     0 
    directed arcs:                       2 
  average markov blanket size:           0.67 
  average neighbourhood size:            0.67 
  average branching factor:              0.33 

  generation algorithm:                  Full Ordering 
  arc sampling probability:              0.1 

empty.graph()函数一样,我们也可以指定num 参数来产生一批图,这样就会产生一个列表,可以使用lapply() 来很好地批量计算。

②以均匀概率从连通的有向无环图空间中抽样

除了使用默认的方法method = "ordered",还可以使用method = "ic-dag"来从具有均匀概率的连通的有向无环图空间中抽样,使用 Ide & Cozman MCMC sampler
请注意,建议的老化长度(length of the burn-in scales ?)与节点数量成二次函数关系,因此使用num批量生成多个图要比一次生成一个图高效得多。

> random.graph(LETTERS[1:6], num = 2, method = "ic-dag")
[[1]]

  Random/Generated Bayesian network

  model:
   [C][E][A|E][B|A:E][D|B:E][F|A:C:D] 
  nodes:                                 6 
  arcs:                                  8 
    undirected arcs:                     0 
    directed arcs:                       8 
  average markov blanket size:           3.67 
  average neighbourhood size:            2.67 
  average branching factor:              1.33 

  generation algorithm:                  Ide & Cozman's Multiconnected DAGs 
  burn in length:                        216 
  maximum in-degree:                     Inf 
  maximum out-degree:                    Inf 
  maximum degree:                        Inf 


[[2]]

  Random/Generated Bayesian network

  model:
   [E][A|E][B|A:E][C|A][D|B:E][F|A:C:D] 
  nodes:                                 6 
  arcs:                                  9 
    undirected arcs:                     0 
    directed arcs:                       9 
  average markov blanket size:           3.67 
  average neighbourhood size:            3.00 
  average branching factor:              1.50 

  generation algorithm:                  Ide & Cozman's Multiconnected DAGs 
  burn in length:                        216 
  maximum in-degree:                     Inf 
  maximum out-degree:                    Inf 
  maximum degree:                        Inf 

此方法对生成的图的结构接受几个可选的约束,如下面的参数所指定:

  • burn.in: the number of iterations for the algorithm to converge to a stationary (and uniform) probability distribution.收敛迭代的次数
  • every: return only one graph every number of steps instead of all the generated graphs. High values of every result in a more diverse set.每个步骤只返回一个图,而不是所有生成的图。every 的高值会导致更多样化的集合。
  • max.degree: the maximum degree of a node.
  • max.in.degree: the maximum in-degree.
  • max.out.degree: the maximum out-degree.
    举例
random.graph(LETTERS[1:6], num = 2, method = "ic-dag", burn.in = 10^4,
+              every = 50, max.degree = 3)
[[1]]

  Random/Generated Bayesian network

  model:
   [B][F|B][A|F][C|A:F][D|A:B][E|B:C] 
  nodes:                                 6 
  arcs:                                  8 
    undirected arcs:                     0 
    directed arcs:                       8 
  average markov blanket size:           3.33 
  average neighbourhood size:            2.67 
  average branching factor:              1.33 

  generation algorithm:                  Ide & Cozman's Multiconnected DAGs 
  burn in length:                        10000 
  maximum in-degree:                     Inf 
  maximum out-degree:                    Inf 
  maximum degree:                        3 


[[2]]

  Random/Generated Bayesian network

  model:
   [A][C|A][F|C][D|F][B|D:F][E|B:C] 
  nodes:                                 6 
  arcs:                                  7 
    undirected arcs:                     0 
    directed arcs:                       7 
  average markov blanket size:           2.67 
  average neighbourhood size:            2.33 
  average branching factor:              1.17 

  generation algorithm:                  Ide & Cozman's Multiconnected DAGs 
  burn in length:                        10000 
  maximum in-degree:                     Inf 
  maximum out-degree:                    Inf 
  maximum degree:                        3 

③以均匀概率从有向无环图空间抽样

Melançon的 MCMC算法以均匀概率从有向无环图空间抽样(不需要连接not necessarily connected),指定method = "melancon"即可。

> random.graph(LETTERS[1:6], method = "melancon")

  Random/Generated Bayesian network

  model:
   [A][B|A][F|A][C|A:B][E|B][D|B:C:E] 
  nodes:                                 6 
  arcs:                                  8 
    undirected arcs:                     0 
    directed arcs:                       8 
  average markov blanket size:           3.00 
  average neighbourhood size:            2.67 
  average branching factor:              1.33 

  generation algorithm:                  Melancon's Uniform Probability DAGs 
  burn in length:                        216 
  maximum in-degree:                     Inf 
  maximum out-degree:                    Inf 
  maximum degree:                        Inf 

至于可选的参数,它与Ide & Cozman算法是相同的。

4、总结

  1. 创建三种网络:空网络、特定网络、随机网络
  2. 创建特定网络网络的三种方法:边集、邻接矩阵、模型公式
  3. 存在的问题:对随机网络不太懂,为什么要创建随机网络,是给我们看看有这些节点可以产生什么样的结构吗?
  4. 随机网络的取样是啥意思?
上一篇:Algorithm第四版算法 C++实现(十二)——使用邻接矩阵法构造无向图


下一篇:数学建模(8)多元线性回归模型