CS188-pj2-Multiagent

1、对抗性搜索

  对抗搜索也称为博弈搜索。

  在人工智能领域可以定义为:在一定规则条件下,有完整信息的、确定性的、轮流行动的、两个游戏者的零和游戏(如象棋)。

  零和游戏:即你死我活,损人利己,为了我取得最佳结果,对手必须取得最差结果。

  既然是游戏,那么就可以对其进行建模了:

    初始状态:游戏开始时的初始值。

    后继函数:进行决策,实现状态之间的迁移。

    终止测试:测试判断游戏是否结束,游戏结束的状态称为终止状态。

    评估函数:判断游戏的结果,谁输谁赢,以及赢了多少。

 

2、MiniMax - 冷酷的上帝

  Minimax算法常用于棋类等由两方较量的游戏和程序。

  Minimax算法是一个零总和算法,即一方要在可选的选项中选择将其优势最大化的选择,另一方则选择令对手优势最小化的方法。

  MiniMax也是一个悲观算法,它假定对手是永不犯错的,在完美对手下寻找最小的损失。

MiniMax算法伪码:

function minimax(node, depth) if node is a terminal node or depth = 0  //终止条件 return the heuristic value of node if the adversary is to play at node     //完美对手,总是选择对其最优的 let α := +∞ foreach child of node α := min(α, minimax(child, depth-1)) else {we are to play at node}         // 我方选择,对我方最有利的 let α := -∞ foreach child of node α := max(α, minimax(child, depth-1)) return α

  如何将MiniMax应用于多人博弈?

    将博弈参与者抽象为我方和对手。

    多个对手表现为到对手出手时走多次。

    每次对抗,我方走一次,对手走n-1次(n为参与者数量)。

    即是用向量来代替最小化值。

  CS-188 MiniMax代理实现:

class MinimaxAgent(MultiAgentSearchAgent):
    """
        Your minimax agent (question 2)
    """
    def getAction(self, gameState):
        """
        Returns the minimax action from the current gameState using self.depth
        and self.evaluationFunction.

        Here are some method calls that might be useful when implementing minimax.

        gameState.getLegalActions(agentIndex):
        Returns a list of legal actions for an agent
        agentIndex=0 means Pacman, ghosts are >= 1

        gameState.generateSuccessor(agentIndex, action):
        Returns the successor game state after an agent takes an action

        gameState.getNumAgents():
        Returns the total number of agents in the game

        gameState.isWin():
        Returns whether or not the game state is a winning state

        gameState.isLose():
        Returns whether or not the game state is a losing state
        """
        "*** YOUR CODE HERE ***"

        # 获取agent的index
        Ghosts = [i for i in range(1, gameState.getNumAgents())]
        
        # 判断游戏是否结束或者是否到达根节点
        def terminate(state, d):
            return state.isWin() or state.isLose() or d == self.depth

        # 计算节点的最小化值
        def min_value(state, d, ghost): 
            if terminate(state, d):
                return self.evaluationFunction(state)
            # Min节点值初始化为正无穷大
            v = float('inf')    
            for action in state.getLegalActions(ghost):
                if ghost == Ghosts[-1]:
                    v = min(v, max_value(state.generateSuccessor(ghost, action), d + 1))
                else:
                    v = min(v, min_value(state.generateSuccessor(ghost, action), d, ghost + 1))
            return v
        
        # 计算节点的最小化值
        def max_value(state, d):  
            if terminate(state, d):
                return self.evaluationFunction(state)
            # Min节点值初始化为负无穷大
            v = -float('inf')
            for action in state.getLegalActions(0):
                v = max(v, min_value(state.generateSuccessor(0, action), d, 1))
            return v
        # 获得每一个可行的action和它的收益
        res = [(action, min_value(gameState.generateSuccessor(0, action), 0, 1)) for action in gameState.getLegalActions(0)]
        # 按收益大小从小到大排序
        res.sort(key=lambda k: k[1])
        # 返回最大收益对应的action
        return res[-1][0]

 

   CS188-pj2-Multiagent

 

3、Alpha-Beta剪枝

  Alpha-beta剪枝是一种找到最佳minimax步骤的方法,同时可以避免搜索不可能被选择的步骤的子树

    Alpha:可能步骤的最大下界,初始化为正无穷大

    Beta:可能步骤的最小上界,初始化为负无穷大

  由于MiniMax算法假设对手是完美的,即我方的选择是在对手选出的最小上界下的最大值,

  因此有效路径满足: α ≤ value ≤ β
  对于不满足该条件的路径,可以把它剪去,不用去扩展了。

  剪枝只影响计算效率,不影响最终结果。

function alphabeta(node, depth, α, β, maximizingPlayer)   // node = 节点,depth = 当前层次,maximizingPlayer = 大分玩家
     if depth = 0 or node is a terminal node
         return the heuristic value of the node
     if we are to play at node
         v := -∞
         foreach child of node
             v := max(v, alphabeta(child, depth - 1, α, β, FALSE)) // child = 子節點
             α := max(α, v)
             if β ≤ α
                 break // β裁剪
         return v
     else {the adversary is to play at node}
         v := ∞
         foreach child of node
             v := min(v, alphabeta(child, depth - 1, α, β, TRUE))
             β := min(β, v)
             if β ≤ α
                 break // α裁剪
         return v

 CS-188 带剪枝的MiniMax代理实现:

class AlphaBetaAgent(MultiAgentSearchAgent):
    """
    Your minimax agent with alpha-beta pruning (question 3)
    """

    def getAction(self, gameState):
        """
        Returns the minimax action using self.depth and self.evaluationFunction
        """
        "*** YOUR CODE HERE ***"
        
        now_value = -float('inf')
        # α初始化为负无穷大
        alpha = -float('inf')
        # β初始化为正无穷大
        beta = float('inf')
        nextAction = Directions.STOP

        # 在所有的可选节点中,选择最大收益的节点
        legal_actions = gameState.getLegalActions(0).copy()
        for next_action in legal_actions:
            nextState = gameState.generateSuccessor(0, next_action)
            next_value = self.get_node_value(nextState, 0, 1, alpha, beta)
            if next_value > now_value:
                now_value, nextAction = next_value, next_action
            alpha = max(alpha, now_value)
        return nextAction

    def get_node_value(self, gameState, cur_depth=0, agent_index=0, alpha=-1e10, beta=1e10):
        """
        Using self-defined function, alpha_value(), beta_value() to choose the most appropriate action
        Only when it's the final state, can we get the value of each node, using the self.evaluationFunction(gameState)
        Otherwise we just get the alpha/beta value we defined here.
        """
        max_party = [0,]
        min_party = list(range(1, gameState.getNumAgents()))
        
        # 达到尽头,返回评价结果
        if cur_depth == self.depth or gameState.isLose() or gameState.isWin():
            return self.evaluationFunction(gameState)
        # 目前节点是最大化节点,尝试更新α的值
        elif agent_index in max_party:
            return self.alpha_value(gameState, cur_depth, agent_index, alpha, beta)
        # 目前节点是最小化节点,尝试更新β的值
        elif agent_index in min_party:
            return self.beta_value(gameState, cur_depth, agent_index, alpha, beta)
        else:
            print('Errors occur in your party division !!! ')

    def alpha_value(self, gameState, cur_depth, agent_index, alpha=-float('inf'), beta=float('inf')):
        # α初始化为负无穷大
        v = -float('inf')
        legal_actions = gameState.getLegalActions(agent_index)
        for index, action in enumerate(legal_actions):
            next_v = self.get_node_value(gameState.generateSuccessor(agent_index, action), cur_depth, agent_index + 1, alpha, beta)
            v = max(v, next_v)
            # 出现了α>β的情况,剪枝
            if v > beta:
                return v
            alpha = max(alpha, v)
        return v

    def beta_value(self, gameState, cur_depth, agent_index, alpha=-float('inf'), beta=float('inf')):
        # α初始化为正无穷大
        v = float('inf')
        legal_actions = gameState.getLegalActions(agent_index)
        # 多人博弈,一个pacman,多个ghost
        for index, action in enumerate(legal_actions):
            if agent_index == gameState.getNumAgents() - 1:
                # pacman进行决策
                next_v = self.get_node_value(gameState.generateSuccessor(agent_index, action), cur_depth + 1, 0, alpha, beta)
                v = min(v, next_v)
                # 出现了α>β的情况,剪枝
                if v < alpha:
                    return v
            else:
                # ghost进行决策
                next_v = self.get_node_value(gameState.generateSuccessor(agent_index, action), cur_depth, agent_index + 1, alpha, beta)
                v = min(v, next_v)
                # 出现了α>β的情况,剪枝
                if v < alpha:
                    return v
            beta = min(beta, v)
        return v

 

   CS188-pj2-Multiagent

 

 

 

 

4、ExpectiMax

  MiniMax假定对手永不犯错,但是,这不太现实。

  而且,在假设对手会犯错的时候,智能体的决策会更好。

  Expectimax搜索算法是用于最大化期望效用博弈论算法。它是Minimax 算法的一种变体。

  Minimax 假设对手(最小化者)以最佳方式进行游戏,而 Expectimax 则不然。

  Expectimax 引入了机会节点,采用机会节点与最大最小化节点交替来使得对手不那么“完美”。

function expectiminimax(node, depth)
    if node is a terminal node or depth = 0
        return the heuristic value of node
    if the adversary is to play at node
        // Return value of minimum-valued child node
        let α := +∞
        foreach child of node
            α := min(α, expectiminimax(child, depth-1))
    else if we are to play at node
        // Return value of maximum-valued child node
        let α := -∞
        foreach child of node
            α := max(α, expectiminimax(child, depth-1))
    else if random event at node
        // Return weighted average of all child nodes' values
        let α := 0
        foreach child of node
            α := α + (Probability[child] × expectiminimax(child, depth-1))
    return α

  CS-188 ExpectiMax代理实现:

class ExpectimaxAgent(MultiAgentSearchAgent):
    """
      Your expectimax agent (question 4)
    """
    def getAction(self, gameState):
        """
        Returns the expectimax action using self.depth and self.evaluationFunction

        All ghosts should be modeled as choosing uniformly at random from their
        legal moves.
        """
        "*** YOUR CODE HERE ***"

        maxValue = -float('inf')
        maxAction = Directions.STOP

        for action in gameState.getLegalActions(agentIndex=0):
            sucState = gameState.generateSuccessor(action=action, agentIndex=0)
            sucValue = self.expNode(sucState, currentDepth=0, agentIndex=1)
            if sucValue > maxValue:
                maxValue = sucValue
                maxAction = action
        return maxAction
    
    # 判断是否结束
    def terminate(self, state, d):
        return state.isWin() or state.isLose() or d == self.depth
    
    # 计算节点的最大化值
    def maxNode(self, gameState, currentDepth):
        if self.terminate(gameState, currentDepth):
            return self.evaluationFunction(gameState)
        maxValue = -float('inf')
        # 计算节点最大化值从它的后继chance节点来获取
        for action in gameState.getLegalActions(agentIndex=0):
            sucState = gameState.generateSuccessor(action=action, agentIndex=0)
            sucValue = self.expNode(sucState, currentDepth=currentDepth, agentIndex=1)
            if sucValue > maxValue:
                maxValue = sucValue
        return maxValue

    # 计算chance节点值
    def expNode(self, gameState, currentDepth, agentIndex):
        if self.terminate(gameState, currentDepth):
            return self.evaluationFunction(gameState)
        numAction = len(gameState.getLegalActions(agentIndex=agentIndex))
        totalValue = 0.0
        numAgent = gameState.getNumAgents()
        for action in gameState.getLegalActions(agentIndex=agentIndex):
            sucState = gameState.generateSuccessor(agentIndex=agentIndex, action=action)
            if agentIndex == numAgent - 1:
                sucValue = self.maxNode(sucState, currentDepth=currentDepth + 1)
            # 每个ghost都要计算一次(相当于用向量代替了min值)
            else:
                sucValue = self.expNode(sucState, currentDepth=currentDepth, agentIndex=agentIndex + 1)
            totalValue += sucValue
        # 返回后继节点的平均值
        return totalValue / numAction

 

   CS188-pj2-Multiagent

 

 

 

 

 

 5、betterEvaluationFunction

  除了对抗博弈之外,难道就没有其他的解法了嘛?

  其实,自己玩一下pacman游戏就会发现,

  我们要做的,就是在不被ghost碰到的同时尽快地吃完食物(离食物尽量近,离ghost尽量远)。

  因此,可以对食物和ghost设一些权值,在下一步是食物时,给pacman一些奖励,告诉它往食物方向走。

  在下一步要碰到ghost时,给pacman一些惩罚,告诉它不能往那边走。

  通过将食物权值设为正,将ghost权值设为负,就可以轻松实现这一点。

  权值的大小则对应了食物诱惑的大小和对ghost的恐惧大小,可以进行参数上的调节。

  同时,不能忘记:pacman吃了大豆子之后,是可以不怕ghost的,这时可以激进一点,主动去吃ghost。

def betterEvaluationFunction(currentGameState):
    """
      Your extreme ghost-hunting, pellet-nabbing, food-gobbling, unstoppable
      evaluation function (question 5).
    """
    newPos = currentGameState.getPacmanPosition()
    newFood = currentGameState.getFood()
    newGhostStates = currentGameState.getGhostStates()

    WEIGHT_FOOD = 10.0              # food的权值
    WEIGHT_GHOST = -10.0              # Ghost的权值
    WEIGHT_SCARED_GHOST = 100.0      # Scared ghost的权值
    
    # 基于目前的得分来进行计算
    score = currentGameState.getScore()
    # 计算和食物的距离
    distancesToFoodList = [util.manhattanDistance(newPos, foodPos) for foodPos in newFood.asList()]
    #距离食物越近,奖励越多
    if len(distancesToFoodList) > 0:
        score += WEIGHT_FOOD / min(distancesToFoodList)
    else:
        score += WEIGHT_FOOD

    for ghost in newGhostStates:
        distance = manhattanDistance(newPos, ghost.getPosition())
        if distance > 0:
            # 距离Sacred ghost越近奖励越多,鼓励它去进攻
            if ghost.scaredTimer > 0:
                score += WEIGHT_SCARED_GHOST / distance
            # 否则,躲开
            else:
                score += WEIGHT_GHOST / distance
        # 下一步就要碰到ghost了,赶紧跑!
        else:
            return -float('inf')  # Pacman is dead at this point

    return score

# Abbreviation
better = betterEvaluationFunction

 

  权值的设定会影响得分,一下为几次调参的结果:

  1、WeightFood=10, WeightGhost=-1, WeightSacredGhost=100:

 

 

    CS188-pj2-Multiagent

 

  2、WeightFood=5, WeightGhost=-1, WeightSacredGhost=100:

 

    CS188-pj2-Multiagent

 

  3、WeightFood=10, WeightGhost=-10, WeightSacredGhost=100:

 

    CS188-pj2-Multiagent

 

  4、WeightFood=5, WeightGhost=-10, WeightSacredGhost=50

   CS188-pj2-Multiagent

 

 

       5、WeightFood=5, WeightGhost=-10, WeightSacredGhost=100  

  CS188-pj2-Multiagent

 

  可以看出:

    第二组最好。

 

上一篇:更改系统字体大小、显示大小、默认dpi


下一篇:2021.12.13 模拟赛