[swarthmore cs75] Compiler 6 – Fer-de-lance

课程回顾

Swarthmore学院16年开的编译系统课,总共10次大作业。本随笔记录了相关的课堂笔记以及第8次大作业。

  • First-class function:

    It treats functions as first-class citizens. This means the language supports passing functions as arguments to other functions, returning them as the values from other functions, and assigning them to variables or storing them in data structures.

    下面的例子在Python中可以运行,最终求值得到7。因为env中绑定的只有args和left variables,要怎么才能在well_formed_e中检查错误?

    [swarthmore cs75] Compiler 6 – Fer-de-lance

    函数的表示:堆区地址, 101;其中堆中存储了:arity(参数个数)和code address(label的地址)

    [swarthmore cs75] Compiler 6 – Fer-de-lance

    在运行时,检查函数参数个数是否正确。

    [swarthmore cs75] Compiler 6 – Fer-de-lance

    下图展示了如何将lambda表达式编译位汇编代码。

    [swarthmore cs75] Compiler 6 – Fer-de-lance

    如何编译嵌套的表达式:lambda m: lambda n: n > m,这里m应该如何处理?一种思路就是把lambda表达式中的unbound value都加入到堆中。如下图所示:

    [swarthmore cs75] Compiler 6 – Fer-de-lance

    下图展示了编译一个lambda表达式并调用的完整实现:

    [swarthmore cs75] Compiler 6 – Fer-de-lance

    [swarthmore cs75] Compiler 6 – Fer-de-lance

编程作业

本次的大作业是实现Fer-de-lance编程语言, aka FDL, aka Functions Defined by Lambdas,支持anonymous, first-class function。

  • 实现细节

    • 抽象语法
      type program = expr
      
      type expr =

    ...

    | ELambda of string list * expr

    | EApp of expr * expr list

    type cexpr =

    ...

    | CLambda of string list * aexpr

    | CApp of immexpr * immexpr list

    + 堆布局
    函数在内存中有下面的布局:arity为实参数目;code ptr是匿名函数的地址;var1..varn为free variables;
    如果函数freevars是3个,则需要加上padding(ESI末尾地址+ 4 & 0xFFFFFFF8),这可以保证新的ESI地址从000开始。

    | arity | code ptr | var1 | var2 | ... | varn | (maybe padding) |

    下面的例子展示了下面lambda函数在堆中的存储方式。

    let x = 10 in

    let y = 12 in

    let f = (lambda z: x + y + z) in

    f(5)

    注:这里并不需要添加padding,只是需要计算一下新的ESI地址。

    | 1 |
    | 20 | 24 | |

    + 计算并存储Free Variables
    相对于lambda表达式,y是free variable

    (lambda(y): x + y)

    相对于lambda表达式,z是free variable

    (lambda(y): let x = 10 in x + y + z)

    相对于lambda表达式,x是free variable

    let x = 10 in

    let f = (lambda(y): x + y) in

    f(10)

    计算出Free Variable后,需要存储到堆中,生成的汇编代码格式如下:

    jmp after1

    temp_closure_1:



    after1:

    mov [esi],

    mov [esi + 4], temp_closure_1

    mov [esi + 8],

    ... and so on for each variable to store

    mov eax, esi

    add eax, 5

    add esi,

    计算Free Variable的具体实现如下:

    let rec contains x ids =

    match ids with

    | [] -> false

    | elt::xs -> (x = elt) || (contains x xs)

    let rec add_set x ids =

    if contains x ids then ids

    else x::ids

    and helper_immexpr ie env =

    match ie with

    | ImmId(name) -> if contains name env then [] else [name]

    | _ -> []

    and helper_cexpr ce env =

    match ce with

    | CPrim1(op, e) -> helper_immexpr e env

    | CPrim2(op, left, right) -> helper_immexpr left env @ helper_immexpr right env

    | CApp(f, iargs) -> helper_immexpr f env @ List.flatten (List.map (fun iarg -> helper_immexpr iarg env) iargs)

    | CPair(left, right) -> helper_immexpr left env @ helper_immexpr right env

    | CLambda(ids, body) -> helper_aexpr body ids

    | CIf(cond, els, thn) -> helper_immexpr cond env @ helper_aexpr els env @ helper_aexpr thn env

    | CImmExpr(i) -> helper_immexpr i env

    and helper_aexpr ae env =

    match ae with

    | ALet(x, bind, body) -> helper_cexpr bind env @ helper_aexpr body (add_set x env)

    | ACExpr(ce) -> helper_cexpr ce env

    and freevars e =

    (* TODO: fill in freevars *)

    List.sort_uniq compare (helper_aexpr e [])

    + 恢复Free Variables
    在Caller中:需要稍微改变一下calling convention,在压入实际参数后,额外压入一个匿名函数的地址。

    mov eax, [ebp-4] ;; (or wherever the variable f happens to be)

    <code to check that eax is tagged 101, and has arity 2>

    push 8

    push 10

    push eax

    mov eax, [eax - 1] ;; the address of the code pointer for the function value

    call eax ;; call the function

    add esp, 12 ;; since we pushed two arguments and the function value, adjust esp by 12

    在Callee中:可以通过这个匿名函数的地址,访问堆中的Free Variables,然后放入到当前栈帧的局部变量(需要稍微改变一下stack_index)。
    这样函数体中的变量可以通过形参和Free Variables的stack_index取到所有在内存中的数据。

    temp_closure_1:



    mov eax, <function value?>

    mov ecx, [eax + 3]
    mov [ebp - 4], ecx
    mov ecx, [eax + 7]
    mov [ebp - 8], ecx
    ... and so on .
    Caller和Callee的具体实现如下:

    let rec acompile_step (s : cexpr) (si : int) (env : int envt) (freevars : string list) : instruction list =

    (* TODO: fill this in yourself this time )

    match s with

    | CApp(f, iargs) ->

    let prelude = acompile_imm f si env in

    let arity = List.length iargs in

    let checked = prelude @ check_fun @

    prelude @ (check_arity arity) in

    let argpushes = List.rev_map (fun a -> IPush(Sized(DWORD_PTR, acompile_imm_arg a si env))) iargs in

    let esp_dist = 4 * arity in

    checked @ prelude @ argpushes @ [

    IPush(Reg(EAX));

    IMov(Reg(EAX), RegOffset(-1, EAX));

    ICall(Reg(EAX));

    ] @ [

    IAdd(Reg(ESP), Const(4 + esp_dist))

    ]

    | CLambda(ids, body) ->

    let label_after = gen_temp "after" in

    let label_temp_closure = gen_temp "closure" in

    let arity = List.length ids in

    let formal_parameters = List.mapi (fun i name -> (name, -i-3)) ids in

    let free_parameters = List.map (fun name ->

    match find env name with

    | Some(n) -> (name, n)

    | None -> failwith ("Unbound identifier in compile: " ^ name)) freevars in

    [

    IJmp(Label(label_after));

    ILabel(label_temp_closure);

    ] @

    [ (
    code for body of closure )

    IPush(Reg(EBP));

    IMov(Reg(EBP), Reg(ESP));

    ISub(Reg(ESP), Const(4 * ((List.length free_parameters) + (count_vars body))));

    IMov(Reg(EAX), RegOffset(8, EBP));

    ] @

    (
    Restoring saved variables to [ebp-4], [ebp-8].... )

    List.flatten (List.mapi (fun i (name, n) ->

    [

    IMov(Reg(ECX), RegOffset(3 + 4 * i, EAX));

    IMov(RegOffset(-4
    (i+1), EBP), Reg(ECX));

    ]) free_parameters) @

    let new_env = (List.mapi (fun i (name, _) -> (name, i+1)) free_parameters) @ formal_parameters in

    (acompile_expr body si new_env freevars) @

    [

    IMov(Reg(ESP), Reg(EBP));

    IPop(Reg(EBP));

    IRet; (* end )

    ] @

    (


    Functions are stored in memory with the following layout:

    -----------------------------------------------------------------

    | arity | code ptr | var1 | var2 | ... | varn | (maybe padding) |

    -----------------------------------------------------------------

    )

    [

    ILabel(label_after);

    IMov(RegOffset(0, ESI), Sized(DWORD_PTR, Const(arity)));

    IMov(RegOffset(4, ESI), Sized(DWORD_PTR, Label(label_temp_closure)));

    ] @

    (
    mov [esi + 8],

    ... and so on for each variable to store )

    List.flatten (List.mapi (fun i (name, n) ->

    [

    IMov(Reg(EAX), RegOffset(-4 * n ,EBP));

    IMov(RegOffset(8 + 4 * i, ESI), Reg(EAX));

    ]

    ) free_parameters) @

    [

    IMov(Reg(EAX), Reg(ESI));

    IAdd(Reg(EAX), Const(5));

    IAdd(Reg(ESI), Const(8 + 4 * List.length free_parameters)); (
    heap offset amount )

    IAdd(Reg(ESI), Const(4)); (
    If the function stored three variables instead of two, then padding would be needed *)

    IAnd(Reg(ESI), HexConst(0xFFFFFFF8));

    ]

  • 测试代码

    let f = (lambda x: x+2) in
    let g = (lambda h: h(5)) in
    g(f) // 输出:7 let x = 10 in
    let f = (lambda y: x + y) in
    f(15) // 输出:25 let x = 10, w = 27, y = 12 in
    let f = (lambda z: x + y + z) in
    f(13) // 输出:35 let add = (lambda x, y: x + y) in
    add(5, 6) // 输出11
  • 可能出现的错误:

/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/lib/libSystem.tbd:4:18: error: unknown enumerated scalar

platform: zippered

^~~~~~~~

file '/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/lib/libSystem.tbd'

clang: error: linker command failed with exit code 1 (use -v to see invocation)

⤇ sudo installer -pkg /Library/Developer/CommandLineTools/Packages/macOS_SDK_headers_for_macOS_10.14.pkg -target /

参考资料

starter-fer-de-lance

上一篇:Eclipse上Spring-tool的安装


下一篇:GO语言的进阶之路-go的程序结构以及包简介