93 Fuzzy Qlearning and Dynamic Fuzzy Qlearning

Introduction

In the reinforcement learning paradigm, an agent receives from its envrionment a scalar reward value called \(reinforcement\). This feedback is rather poor: it can be loolean (true, false) or fuzzy (bad, fair, very good, ...), and, moreover, ti may be delayed. A sequence of control actions is often executed before receiving any information on the quality of the whole sequence. Therefore, it is difficult to evaluate the contribution of on individual action.

Q-learning

Q-learning is a form of competitve learning which provides agents with the capability of learning to act optimally by evaluatiing the consequences of actons. Q-learning keeps a Q-function which attempts to estimate the discounted future reinforcement fo taking actions from given states. A Q-function is a mapping from state-action pairs to predicted reinforcement. In order to explain the method, we adopt the implementation proposed by Bersini.

  1. The state space, \(U\subset R^{n}\), is partitioned into hypercubes or cells. Among these cells we can distinguish: (a) one particular cell, called the target cell, to which the quality value +1 is assigned, (b) a subset of cells, called viability zone, that the process must not leave. The quality value for viability zone is 0. This notion of viability zone comes from Aubin and eliminates strong constraints on a reference trajectory for the process. (c) the remaining cells, called failure zone, with the quality value -1.
  2. In each ceel, a set of \(J\) agents compete to control a process. With \(M\) cells, the agent \(j\), $j \in {1,\ldots, J} $
上一篇:11月新更!.NET平台Aspose.Cells v19.11新版来啦!支持以像素为单位获取缩进大小


下一篇:导入导出EXEC