正则语言与自动机 1 - Regular Language with DFA and NFA

Regular Languages

Finite Automata

Finite automata are good models for computers with an extremely limited amount of memory. The controller moves from state to state, depending on the input it receives. Finite automata and their probabilistic counterpart Markov Chains are useful tools when we are attempting to recognize patterns in data.

Formal Definition of a finite automaton

​ A finite automaton is a list of those five objects: set of states, input alphabet, rules for moving, start state, and accept states.

Definition A finite automaton is a 5-tuple \((Q,\sum,\delta,q_0,F)\), where:

  • \(Q\) is a finite set called the state,
  • \(\sum\) is a finite set called the alphabet,
  • \(\delta: Q\times\sum\rightarrow Q\) is the transition function,
  • \(q_0\in Q\) is the start state, and
  • \(F\subseteq Q\) is the set of accept states.

Definition If \(A\) is the set of all strings that machine \(M\) accepts, we say that \(A\) is the language of machine \(M\) and write \(L(M)=A\). We say that \(M\) recognizes \(A\) or that \(M\) accepts \(A\). A machine may accepts several strings, but it always recognizes only one language. If the machine accepts no strings, it still recognizes only one language the empty language \(\emptyset\).

Formal Definition of Computation

​ Let \(M=(Q,\sum,\delta,q_0,F)\) be a finite automaton and let \(w=w_1w_2\cdots w_n\) be a string where each \(w_i\) is a member of the alphabet \(\sum\). Then \(M\) accepts \(w\) if a sequence of states \(r_0,r_1,\cdots, r_n\) in \(Q\) exists with three conditions:

  1. \(r_0=q_0\)
  2. \(\delta(r_i,w_{i+1})=r_{i+1}\), for \(i=0,1,\cdots,n-1\), and
  3. \(r_n\in F\).

We say that \(M\) recognizes language \(A\) if \(A=\{w|M\quad accepts\quad w\}\).

Definition A language is called a regular language if some finite automaton recognizes it.

The Regular Operations

Definition Let \(A\) and \(B\) be languages. We define the regular operations union, concatenation, and star as follows:

  • Union: \(A\cup B=\{x|x\in A\textit{ or }x\in B\}\).
  • Concatenation: \(A\circ B=\{xy|x\in A\quad and \quad y\in B\}\)
  • Star: \(A^*=\{x_1x_2,\cdots,x_k|k\geq 0\textit{ and each }x_i\in A\}\). The empty string \(\epsilon\) is always a member of \(A^*\), no matter what \(A\) is. The set can be treated as all possible combination of the substring in \(A\).

Generally speaking, a collection of objects is closed under some operation if applying that operation to members of the collection returns an object still in the collection.

Theorem The class of regular language is closed under the union operation and the concatenation operation.

Nondeterminism

When the machine is in a given state and reads the next input symbol, we know what the next state will be -- it is determined. We call this deterministic computation. In a nondeterministic machine, several choices may exist for the next state at any point.

Every state of deterministic finite automaton (DFA) always has exactly one exiting transition arrow for each symbol in alphabet. In an nondeterministic finite automaton (NFA), a state may have zero, one, or many exiting arrows for each alphabet symbol, including \(\epsilon\).

How NFA compute?

​ After reading that symbol, the machine splits into multiple copies of itself and follows all the possibilities in parallel. Each copy of the machine takes one of the possible ways to proceed and continues as before. Finally, if any one of these copies of the machine is in an accept state at the end of the input, the NFA accepts the input string.

Formal Definition of a Nondeterministic finite automaton

In a DFA, the transition function takes a state and an input symbol and produces the next state. In an NFA, the transition function takes a state and an input symbol or the empty string and produce the set of possible next states.

For any set \(Q\) we write \(P(Q)\) to be the collection of all subsets of \(Q\). Here \(P(Q)\) is called the power set of \(Q\). For any alphabet \(\sum\) we write \(\sum_\epsilon\) to be \(\sum\cup\{\epsilon\}\).

Definition A nondeterministic finite automaton is a 5-tuple \((Q,\sum,\delta,q_0,F)\) where:

  • \(Q\) is a finite set of states,
  • \(\sum\) is a finite set called the alphabet,
  • \(\delta: Q\times\sum_\epsilon\rightarrow P(Q)\) is the transition function,
  • \(q_0\in Q\) is the start state, and
  • \(F\subseteq Q\) is the set of accept states.

Equivalence of NFAs and DFAs

DFA and NFA recognize the same class of languages. We say that two machines are equivalent if they recognize the same language.

Theorem Every NFA has an equivalent deterministic finite automaton.

Proof

Let \(N=(Q, \sum, \delta, q_0, F)\) be the NFA recognizing some language \(A\). We construct a DFA \(M=(Q',\sum,\delta',q_0',F')\) recognizing \(A\). Let's first consider the easier case wherein \(N\) has no \(\epsilon\) arrows.

  1. \(Q'=P(Q)\).

  2. For \(R\in Q'\) and \(a\in\sum\), let \(\delta'(R,a)=\{q\in Q|q\in\delta(r,a) \textit{ for some } r\in R\}\), or simply,

    \[\delta'(R,a)=\bigcup_{r\in R}\delta(r,a) \]

  3. \(q_0'=\{q_0\}\).

  4. \(F'=\{R\in Q'|R \textit{ contains an accept state of } N\}\).

To consider the \(\epsilon\) errors, we define \(E(R)\) to be the collection of states that can be reached from members of \(R\) by going only along \(\epsilon\) arrows, including the members of \(R\) themselves.

  • The new transition function can be written as:

\[\delta'(R,a)=\{q\in Q|q\in E(\delta(r,a)) \textit{ for some }r\in R\} \]

  • Changing \(q_0'\) to be \(E(\{q_0\})\).

Corollary A language is regular if and only if some NFA recognizes it.

Definition The class of regular language is closed under the union, concatenation and star operation.

Regular Expression

Definition Say that \(R\) is a regular expression if \(R\) is:

  • \(a\) for some \(a\) in the alphabet \(\sum\),
  • \(\epsilon\),
  • \(\emptyset\), the empty language
  • \((R_1\cup R_2)\), where \(R_1\) and \(R_2\) are regular expressions,
  • \((R_1\circ R_2)\), where \(R_1\) and \(R_2\) are regular expressions, or
  • \((R_1^*)\), where \(R_1\) is a regular repression.

The star operation is done first, followed by concatenation and finally union, unless parentheses change the usual order. For convenience, we let \(R^+=RR^*\) and we write \(L(R)\) to be the language of \(R\).

Theorem A language is regular (recognized by DFA/NFA) if and only if some regular expression describes it.

Theorem If a language is described by a regular expression, then it is regular; If a language is regular, then it is described by a regular expression.

Nonregular Languages

Theorem Pumping lemma: If \(A\) is a regular language, then there is a number \(p\) (the pumping length) where if \(s\) is any string in \(A\) of length at least \(p\), then \(s\) may be divided into three pieces, \(s=xyz\), satisfying the following conditions:

  1. for each \(i\geq 0, xy^iz\in A\),
  2. \(|y|>0\), and
  3. \(|xy|\leq p\).

This theorem states that all regular languages have a special property. If we can show that a language does not have this property, we are guaranteed that it is not regular. The property states that all strings in the language can be pumped if they are least as long as a certain special value, called the pumping length. That means each such string contains a section that can be repeated any number of times with the resulting string remaining in the language.

Reference

Introduction to the theory of computation, 3rd Edition, Michael Sipser

上一篇:OpenCL vector index


下一篇:unity shader入门(二)顶点函数和片元函数的编写