[binaryterms稻糠亩] A compiler transforms a program from its source language to a target language. The meaning of the program does not change. 'transform' is more fancy and mathy than 'translate'. For example, from
for (int i = 1, j = 2; j >= 0; j = j - 1) i = i + 1
to
( 0, 0, 0, 0, 0, 0, 0, 1 ), # store a 1
( 0, 0, 1, 0, 0, 0, 1, 0 ), # store b 2
( 0, 1, 0, 0, 0, 0, 0, 1 ), # add a 1
( 0, 1, 1, 1, 1, 1, 1, 1 ), # add b -1
( 1, 0, 0, 0, 0, 0, 1, 0 ), # jlz 2
( 1, 1, 0, 1, 1, 1, 0, 1 ), # jmp -3
( 1, 1, 0, 0, 0, 0, 0, 0 ), # jmp 0
Those 0s and 1s are binary bits. 0000000100100010010000010111111100000101101110111000000 is called machine code; machines are too stupid to speak a language.
A compiler is used to transform a program in a high-level language to a program in a low-level language. Java, C, C++, C#, Perl, and Python are high-level languages.
The compiler operates in phases. At each phase, the compiler decomposes the source program in a process to produce a target program.
The lexical analysis is the initial phase in the analysis of the source program. The source program is made up of a stream of characters, and it is the input of the lexer, which outputs a sequence of tokens. Consider a small assignment statement that calculates the interest (simple interest) on the principal amount.
Interest = Principal * Rate * Time
This assignment statement above can be broken into the following tokens: Identifier, Assignment_Symbol, Identifier, Operator, Identifier, Operator, Identifier.
The syntax analysis phase is also referred to as hierarchical analysis or parsing. 我们分析句子成分,也研究字的构成。The tokens are grouped into grammatical phrases, and finally the parse tree.
How to write a parser? GNU Bison is a parser generator. Bison reads a specification of a context-free language, and generates a parser (either in C, C++, or Java). You can find some examples by searching "bison calculator example". Bison is not a toy; the following list is of some projects that are known to use Bison:
* MySQL and PostgresSQL
* CMake
* Bash shell uses a yacc grammar for parsing the command input.
* GCC started out using Bison, but switched to a hand-written recursive-descent parser for C++ in 2004 (version 3.4), and for C and Objective-C in 2006 (version 4.1)
* The Go programming language (GC) used Bison, but switched to a hand-written scanner and parser in version 1.5.
* Perl 5 uses a Bison-generated parser starting in 5.10.
* The PHP programming language (Zend Parser).
* Bison's own grammar parser is generated by Bison.