以下是我尝试为其生成解析器的EBNF格式(大部分-实际语法记录为here)语法:
expr = lambda_expr_list $;
lambda_expr_list = [ lambda_expr_list "," ] lambda_expr;
lambda_expr = conditional_expr [ "->" lambda_expr ];
conditional_expr = boolean_or_expr [ "if" conditional_expr "else" conditional_expr ];
boolean_or_expr = [ boolean_or_expr "or" ] boolean_xor_expr;
boolean_xor_expr = [ boolean_xor_expr "xor" ] boolean_and_expr;
boolean_and_expr = [ boolean_and_expr "and" ] boolean_not_expr;
boolean_not_expr = [ "not" ] relation;
relation = [ relation ( "=="
| "!="
| ">"
| "<="
| "<"
| ">="
| [ "not" ] "in"
| "is" [ "not" ] ) ] bitwise_or_expr;
bitwise_or_expr = [ bitwise_or_expr "|" ] bitwise_xor_expr;
bitwise_xor_expr = [ bitwise_xor_expr "^" ] bitwise_and_expr;
bitwise_and_expr = [ bitwise_and_expr "&" ] bitwise_shift_expr;
bitwise_shift_expr = [ bitwise_shift_expr ( "<<"
| ">>" ) ] subtraction_expr;
subtraction_expr = [ subtraction_expr "-" ] addition_expr;
addition_expr = [ addition_expr "+" ] division_expr;
division_expr = [ division_expr ( "/"
| "\\" ) ] multiplication_expr;
multiplication_expr = [ multiplication_expr ( "*"
| "%" ) ] negative_expr;
negative_expr = [ "-" ] positive_expr;
positive_expr = [ "+" ] bitwise_not_expr;
bitwise_not_expr = [ "~" ] power_expr;
power_expr = slice_expr [ "**" power_expr ];
slice_expr = member_access_expr { subscript };
subscript = "[" slice_defn_list "]";
slice_defn_list = [ slice_defn_list "," ] slice_defn;
slice_defn = lambda_expr
| [ lambda_expr ] ":" [ [ lambda_expr ] ":" [ lambda_expr ] ];
member_access_expr = [ member_access_expr "." ] function_call_expr;
function_call_expr = atom { parameter_list };
parameter_list = "(" [ lambda_expr_list ] ")";
atom = identifier
| scalar_literal
| nary_literal;
identifier = /[_A-Za-z][_A-Za-z0-9]*/;
scalar_literal = float_literal
| integer_literal
| boolean_literal;
float_literal = point_float_literal
| exponent_float_literal;
point_float_literal = /[0-9]+?\.[0-9]+|[0-9]+\./;
exponent_float_literal = /([0-9]+|[0-9]+?\.[0-9]+|[0-9]+\.)[eE][+-]?[0-9]+/;
integer_literal = dec_integer_literal
| oct_integer_literal
| hex_integer_literal
| bin_integer_literal;
dec_integer_literal = /[1-9][0-9]*|0+/;
oct_integer_literal = /0[oO][0-7]+/;
hex_integer_literal = /0[xX][0-9a-fA-F]+/;
bin_integer_literal = /0[bB][01]+/;
boolean_literal = "true"
| "false";
nary_literal = tuple_literal
| list_literal
| dict_literal
| string_literal
| byte_string_literal;
tuple_literal = "(" [ lambda_expr_list ] ")";
list_literal = "[" [ ( lambda_expr_list
| list_comprehension ) ] "]";
list_comprehension = lambda_expr "for" lambda_expr_list "in" lambda_expr [ "if" lambda_expr ];
dict_literal = "{" [ ( dict_element_list
| dict_comprehension ) ] "}";
dict_element_list = [ dict_element_list "," ] dict_element;
dict_element = lambda_expr ":" lambda_expr;
dict_comprehension = dict_element "for" lambda_expr_list "in" lambda_expr [ "if" lambda_expr ];
string_literal = /[uU]?[rR]?(\u0027(\\.|[^\\\r\n\u0027])*\u0027|\u0022(\\.|[^\\\r\n\u0022])*\u0022)/;
byte_string_literal = /[bB][rR]?(\u0027(\\[\u0000-\u007F]|[\u0000-\u0009\u000B-\u000C\u000E-\u0026\u0028-\u005B\u005D-\u007F])*\u0027|\u0022(\\[\u0000-\u007F]|[\u0000-\u0009\u000B-\u000C\u000E-\u0021\u0023-\u005B\u005D-\u007F])*\u0022)/;
我用来生成解析器的工具是Grako,它生成了一个经过修改的Packrat解析器,声称支持直接和间接的左递归.
当我在此字符串上运行生成的解析器时:
input.filter(e -> e[0] in ['t', 'T']).map(e -> (e.len().str(), e)).map(e -> '(Line length: ' + e[0] + ') ' + e[1]).list()
我收到以下错误:
grako.exceptions.FailedParse: (1:13) Expecting end of text. :
input.filter(e -> e[0] in ['t', 'T']).map(e -> (e.len().str(), e)).map(e -> '(Line length: ' + e[0] + ') ' + e[1]).list()
^
expr
调试显示,解析器似乎到达了第一个e [0]的末尾,然后再也没有回退到/到达它将尝试与in令牌匹配的点.
我的语法是否存在问题,以至于左递归支持的Packrat解析器将对此失败?还是应该在Grako问题追踪器上提交问题?
解决方法:
这可能是语法错误,但错误消息并未告诉您它实际发生的位置.完成语法后,我总是做的是在整个代码中嵌入cut(〜)元素(在if,operator,open括号等似乎合理的关键字之后).
cut元素使Grako生成的解析器提交到解析树中最接近的选项中采用的选项.这样,它将使解析器在实际上无法解析的表达式上报告失败,而不是让if解析器在if开始时失败.
语法中的一些错误很难发现,为此,我只是通过解析跟踪来找出解析器在输入中走了多远,以及为什么解析器无法继续下去.
我不会在PEG解析器上使用左递归进行专业工作,尽管对于简单,学术性的工作可能很好.
boolean_or_expr = boolean_xor_expr {"or" boolean_xor_expr};
然后可以在语义动作中处理关联性.
另请参阅针对Grako的issue 49下的讨论.它说,用于支持左递归的算法将不会总是在最终的AST中产生预期的关联性.