CS188 Project 4: Inference in Bayes Nets 4,5,6

Question 4 (4 points): Eliminate

1.要求

  实现消元的函数,该函数接收一个factor和一个待消除的因子。返回消除了该因子后的新的factor。
  提示:
    1.需返回一个新的factor。
    2.该函数可以用来求边缘分布,如:
        eliminate(P(X,Y|Z),Y)=P(X|Z)
        eliminate(P(X,Y|Z),X)=P(Y|Z)
    3.返回的factor的variableDomainsDict和原factor一样。

2.分析

通过上面的要求可知,就是要我们实现一个求边缘分布的函数。
所谓的消元就是求边缘分布来实现消除一个变量。
问题就变成了求边缘分布。问题解决步骤为:
  1.构造表示边缘分布的newFactor
  2.为newFactor的联合概率表赋值:
        将原来的factor联合概率表中的边缘分布项相同的相加合并。
        合并后的表即为边缘分布的联合概率表。
  3.返回边缘分布。

3.代码

点击查看代码
    def eliminate(factor, eliminationVariable):
        """
        Question 4: Your eliminate implementation

        Input factor is a single factor.
        Input eliminationVariable is the variable to eliminate from factor.
        eliminationVariable must be an unconditioned variable in factor.

        You should calculate the set of unconditioned variables and conditioned
        variables for the factor obtained by eliminating the variable
        eliminationVariable.

        Return a new factor where all of the rows mentioning
        eliminationVariable are summed with rows that match
        assignments on the other variables.

        Useful functions:
        Factor.getAllPossibleAssignmentDicts
        Factor.getProbability
        Factor.setProbability
        Factor.unconditionedVariables
        Factor.conditionedVariables
        Factor.variableDomainsDict
        """
        # autograder tracking -- don't remove
        if not (callTrackingList is None):
            callTrackingList.append(('eliminate', eliminationVariable))

        # typecheck portion
        if eliminationVariable not in factor.unconditionedVariables():
            print("Factor failed eliminate typecheck: ", factor)
            raise ValueError("Elimination variable is not an unconditioned variable " \
                            + "in this factor\n" +
                            "eliminationVariable: " + str(eliminationVariable) + \
                            "\nunconditionedVariables:" + str(factor.unconditionedVariables()))

        if len(factor.unconditionedVariables()) == 1:
            print("Factor failed eliminate typecheck: ", factor)
            raise ValueError("Factor has only one unconditioned variable, so you " \
                    + "can't eliminate \nthat variable.\n" + \
                    "eliminationVariable:" + str(eliminationVariable) + "\n" +\
                    "unconditionedVariables: " + str(factor.unconditionedVariables()))

        "*** YOUR CODE HERE ***"

        unconditionedVar = factor.unconditionedVariables()
        unconditionedVar.remove(eliminationVariable)
        newFactor = Factor(unconditionedVar, factor.conditionedVariables(), factor.variableDomainsDict())

        # 求边缘分布, 将相同的合并
        for assignment in factor.getAllPossibleAssignmentDicts():
            result = 0
            newAssignment = copy.deepcopy(assignment)
            newAssignment.pop(eliminationVariable)
            for assignment2 in factor.getAllPossibleAssignmentDicts():
                if set(newAssignment.items()).issubset(set(assignment2.items())):
                    result += factor.getProbability(assignment2)
            newFactor.setProbability(newAssignment, result)

        return newFactor
    
    return eliminate

Question 5 (4 points): Normalize

1.要求

   要求实现一个正则化函数,该函数接收一个待正则化的factor,返回正则化后的factor。如果factor中的entry的概率值为0,则返回None。
  提示:
    1.需返回一个新的factor。
    2.正则化不应该影响到概率分布。
    3.返回的factor的variableDomainsDict和原factor一样。

2.分析

   什么是正则化?就是在保证数据相对大小不变的情况下缩小数据的scale,这里的要求是使得总概率相加为1。
   什么叫不影响概率分布?即概率大小关系保持不变。
   如何实现?让每个概率值除以总概率值作为它的正则化后的值。

3.代码

点击查看代码
  def normalize(factor):
    """
    Question 5: Your normalize implementation

    Input factor is a single factor.

    The set of conditioned variables for the normalized factor consists
    of the input factor's conditioned variables as well as any of the
    input factor's unconditioned variables with exactly one entry in their
    domain.  Since there is only one entry in that variable's domain, we
    can either assume it was assigned as evidence to have only one variable
    in its domain, or it only had one entry in its domain to begin with.
    This blurs the distinction between evidence assignments and variables
    with single value domains, but that is alright since we have to assign
    variables that only have one value in their domain to that single value.

    Return a new factor where the sum of the all the probabilities in the table is 1.
    This should be a new factor, not a modification of this factor in place.

    If the sum of probabilities in the input factor is 0,
    you should return None.

    This is intended to be used at the end of a probabilistic inference query.
    Because of this, all variables that have more than one element in their
    domain are assumed to be unconditioned.
    There are more general implementations of normalize, but we will only
    implement this version.

    Useful functions:
    Factor.getAllPossibleAssignmentDicts
    Factor.getProbability
    Factor.setProbability
    Factor.unconditionedVariables
    Factor.conditionedVariables
    Factor.variableDomainsDict
    """

    # typecheck portion
    variableDomainsDict = factor.variableDomainsDict()
    for conditionedVariable in factor.conditionedVariables():
        if len(variableDomainsDict[conditionedVariable]) > 1:
            print("Factor failed normalize typecheck: ", factor)
            raise ValueError("The factor to be normalized must have only one " + \
                            "assignment of the \n" + "conditional variables, " + \
                            "so that total probability will sum to 1\n" +
                            str(factor))

    "*** YOUR CODE HERE ***"

    unconditionedList = factor.unconditionedVariables().copy()
    unconditionedVar = factor.unconditionedVariables()
    conditionedVar = factor.conditionedVariables()
    domain = factor.variableDomainsDict()

    # The set of conditioned variables for the normalized factor consists
    # of the input factor's conditioned variables as well as any of the
    # input factor's unconditioned variables with exactly one entry in their
    # domain.
    for var in unconditionedList:
        if len(domain[var]) == 1:
            unconditionedVar.remove(var)
            conditionedVar.add(var)

    newFactor = Factor(unconditionedVar, conditionedVar , domain)
    sum = 0
    for assignment in factor.getAllPossibleAssignmentDicts():
        sum += factor.getProbability(assignment)

    for assignment in factor.getAllPossibleAssignmentDicts():
        newFactor.setProbability(assignment, factor.getProbability(assignment)/sum)

    return newFactor

Question 6 (4 points): Variable Elimination

1.要求

  实现函数inferenceByVariableElimination,该函数输入待消元的factor,返回一个消元后的factor。
  提示:
    1.该函数需迭代消除所有的隐藏变量,直到消到只剩一个evidence变量。
    2.返回的factor的联合概率表的概率之和应该等于1。
    3.查看inference.py中的推断ByEnumeration函数,了解如何使用所需函数的示例。(提醒:枚举推理首先连接所有变量,然后消除所有隐藏变量。相反,变量消除通过迭代所有隐藏变量交错连接和消除,并在移动到下一个隐藏变量之前对单个隐藏变量执行连接和消除)。
    4.需考虑输入factor只有一个unconditional variable的情况。

2.分析

   这里使用到了变量消元法,变量消除法的思想很简单,就是对联合概率不断求和消除其中的变量,最后得到边缘分布
   我们只需要迭代使用第四问中实现的消元函数,将输入的factor消元到只有一个变量即可。
   变量消元法计算的复杂度和消元的顺序有关系,因此,要按照输入的eliminationOrder来进行消元。

3.代码

点击查看代码
      def inferenceByVariableElimination(bayesNet, queryVariables, evidenceDict, eliminationOrder):
        """
        Question 6: Your inference by variable elimination implementation

        This function should perform a probabilistic inference query that
        returns the factor:

        P(queryVariables | evidenceDict)

        It should perform inference by interleaving joining on a variable
        and eliminating that variable, in the order of variables according
        to eliminationOrder.  See inferenceByEnumeration for an example on
        how to use these functions.

        You need to use joinFactorsByVariable to join all of the factors
        that contain a variable in order for the autograder to
        recognize that you performed the correct interleaving of
        joins and eliminates.

        If a factor that you are about to eliminate a variable from has
        only one unconditioned variable, you should not eliminate it
        and instead just discard the factor.  This is since the
        result of the eliminate would be 1 (you marginalize
        all of the unconditioned variables), but it is not a
        valid factor.  So this simplifies using the result of eliminate.

        The sum of the probabilities should sum to one (so that it is a true
        conditional probability, conditioned on the evidence).

        bayesNet:         The Bayes Net on which we are making a query.
        queryVariables:   A list of the variables which are unconditioned
                          in the inference query.
        evidenceDict:     An assignment dict {variable : value} for the
                          variables which are presented as evidence
                          (conditioned) in the inference query.
        eliminationOrder: The order to eliminate the variables in.

        Hint: BayesNet.getAllCPTsWithEvidence will return all the Conditional
        Probability Tables even if an empty dict (or None) is passed in for
        evidenceDict. In this case it will not specialize any variable domains
        in the CPTs.

        Useful functions:
        BayesNet.getAllCPTsWithEvidence
        normalize
        eliminate
        joinFactorsByVariable
        joinFactors
        """

        # this is for autograding -- don't modify
        joinFactorsByVariable = joinFactorsByVariableWithCallTracking(callTrackingList)
        eliminate             = eliminateWithCallTracking(callTrackingList)
        if eliminationOrder is None: # set an arbitrary elimination order if None given
            eliminationVariables = bayesNet.variablesSet() - set(queryVariables) -\
                                   set(evidenceDict.keys())
            eliminationOrder = sorted(list(eliminationVariables))

        "*** YOUR CODE HERE ***"
        factors = bayesNet.getAllCPTsWithEvidence(evidenceDict)
        for v in eliminationOrder:
            factors, new_factor = joinFactorsByVariable(factors, v)
            if len(new_factor.unconditionedVariables()) > 1:
                temp_factor = eliminate(new_factor, v)
                factors.append(temp_factor)
        newFactor = joinFactors(factors)
        return normalize(newFactor)

总结

这三个题目的解答中,使用到了变量消元方法,通过将联合分布中的变量不断消元,逐步使得推理变得简化,当消元到只剩一个条件变量时,这时候条件分布就很容易得到了。
这些内容在课堂上其实都有讲过,只是,通过这个实验,体会到了它的具体实现和实际作用。

上一篇:美国国会听证会探讨“深度伪造(deepfake)”风险及对策


下一篇:CS188-pj4-Project 4: Inference in Bayes Nets