重读&笔记系列-《深入理解计算机系统》第三章-part1,part2

Machine Programming I

Machine Programming I: Basic

History of Intel processors and architectures

C, assembly, machine code

Definition

  • Architecture: also ISA instruction set architecture) The parts of a processor design that one needs to understand or write assembly/machine code.

    • Example: instruction set specification, registers.
  • MicroArchitecture: Implementation of the architecure.

    • Example: cache sizes and core frequency.
  • Code Forms:

    • Machine Code: The byte-level programs that a processor executes.
    • Assembly Code: A text representation of machine code
  • Example ISAs:

    • Intel: x86, IA32, Itanium, x86-64
    • ARM: Used in almost all mobile phones

Programmer-Visible State

  • PC: Program counter
    • Address of next instruction
    • Called “RIP” (x86-64)
  • Register file
    • Heavily used program data
  • Condition codes
    • Store status information about most recent architecture or logical operation
    • Used for conditional branching
  • Memory
    • Byte addressable array
    • Code and user data
    • Stack to support procedures

Turning C into Object Code

  • Code in file p1.c p2.c
  • Compile with command: gcc -Og p1.c p2.c -o p
    • Use basic optimizations(-Og) [New to recent version of GCC]
    • Put resulting binary in file p

Compiling Into Assembly

C Code (sum.c)

long plus(long x, long y);

void sumstore(long x, long y, long *dest)
{
    long t = plus(x, y);
    *dest = t;
}

Generated x86-64 Assembly

sumstore:
	pushq	%rbx
	movq	%rdx, %rbx
	call 	plus
	movq	%rax, (%rbx)
	popq	%rbx
	ret

Obtain(on shark machine) with command

gcc -Og -S sum.c

Produces file sum.s

Assembly Characteristics: Data Types

  • "Integers" data of 1, 2, 4 or 8 bytes
    • Data value
    • Addresses(untyped pointers)
  • Floating point data of 4, 8 or 10 bytes
  • Code: Byte sequences encoding series of instructions
  • No aggregate types such as arrays or structures
    • Just contiguously allocated bytes in memory

Assembly Characteristics: Operations

  • Perform arithmetic function on register or memory data

  • Transfer data between memory and register

    • Load data from memory into register
    • Store register data into memory
  • Transfer control

    • Unconditional jumps to/from procedures
    • Conditional branches

Object Code

  • Assembler
    • Translates .s into .o
    • Binary encoding of each instruction
    • Nearly-complete image of executable code
    • Missing linkages between code in different files
  • Linker
    • Resolves references between files
    • Combines with static run-time libraries
      • E.g., code for malloc, printf
    • Some libraries are dynamically linked
      • Linking occurs when program begin execution

Machine Instruction Example

  • C code

    *dest = t;
    
    • Store value t where designated by dest
  • Assambly

    movq %rax, (%rbx)
    
    • Move 8-byte value to memory

      • Quad words in x86-64 parlance
    • Operands:

      t: Register %rax

      dest: Register %rbx

      *dest: Memory M[%rbx]

  • Object Code

    0x40059e:	48 89 03
    
    • 3-byte instruction
    • Stored at address 0x40059e

Disassembling Object Code

  • Disassembled
0000000000400595 <sumstore>:
	400595:	53					push 	%rbx
	400596:	48 89 d3			mov		%rdx, %rbx
	400599:	e8 f2 ff ff ff		callq	400590 <plus>
	40059e:	48 89 03			mov		%rax, (%rbx)
	4005a1:	5b					pop		%rbx
	4005a2:	c3					retq
  • Disassembler

    objdump -d sum
    
    • Useful tool for examining object code
    • Analyzes bit pattern of series of instructions
    • Produces approximate rendition of assembly code
    • Can be run on either a.out or .o file

    Alternate Disassembly

    • Within gdb Debugger

      ​ gdb sum

      ​ disassemble sumstore

      • Disassemble procedure

        x/14xb sumstore

      • Examine the 14 bytes starting at sumstore

What Can be Disassembled?

  • Anything that can be interpreted as executable code
  • Disassembler examines bytes and reconstructs assembly

Assembly Basics: Registers, operands, move

Moving Data

  • Moving Data

    moveq Source, Dest:

  • Operand Types

    • Immediate: Constant integer data
      • Example: $0x400, $-533
      • Like C constant, but prefixed with ‘$’
      • Encoded with 1, 2, or 4 bytes
    • Register: One of 16 integer registers
      • Example: %rax, %r13
      • But %rsp reserved for special use
      • Others have special uses for particular instructions
    • Memory: 8 consecutive bytes of memory at address given by register
      • Simplest example: (%rax)
      • Various other “address modes”

moveq Operand Combinations

Source -> Dest Src, Dest C Analog
Imm -> Reg moveq $0x4, %rax temp = 0x4;
Imm -> Mem moveq %-147, (%rax) *p = -147;
Reg -> Reg moveq %rax, %rdx temp2 = temp1;
Reg -> Mem moveq %rax, (%rdx) *p = temp;
Mem -> Reg moveq (%rax), rdx temp = *p;

Simple Memory Addressing Modes

  • Normal ® Mem[Reg[R]]

    • Register R specifies memory address

    • Aha! Pointer dereferencing in C

      moveq (%rcx), %rax

  • Displacement D® Mem[Reg[R]+D]

    • Register R specifies start of memory region

    • Constant displacement D specifies offset

      moveq 8(%rbp), %rdx

Complete Memory Addressing Modes

  • Most General Form D(Rb, Ri, S) Mem[Reg[Rb]+S*Reg[Ri]+D]

    • D: Constant “displacement” 1, 2 or 4 bytes
    • Rb: Base register: Any of 16 integer registers
    • Ri: Index register: Any, except for %rsp
    • S: Scale: 1, 2, 4, or 8 (why these numbers?)
  • Special Cases

    ​ (Rb, Ri) Mem[Reg[Rb]+Reg[Ri]]

    ​ D(Rb, Ri) Mem[Reg[Rb]+Reg[Ri]+D]

    ​ (Rb, Ri, S) Mem[Reg[Rb]+S*Reg[Ri]]

Address Computation Instruction

  • leaq Src, Dst

    • Src is address mode expression
    • Set Dst to address denoted by expression
  • Uses

    • Computing addresses without a memory reference
      • E.g., translation of p = &x[i];
    • Computing arithmetic expressions of the form x + k*y
      • k = 1, 2, 4, or 8
  • Example

    long m12(long x) {
        return x*12;
    }
    
    leaq (%rdi, %rdi, 2), %rax 		# t <- x + x*2
    salq $2, %rax				 	# return t
    

Some Arithmetic Operations

  • Two Operand Instructions:

    Format Computation
    addq Src, Dest Dest = Dest + Src
    subq Src, Dest Dest = Dest - Src
    imulq Src, Dest Dest = Dest * Src
    salq Src, Dest Dest = Dest << Src
    sarq Src, Dest Dest = Dest >> Src
    shrq Src, Dest Dest = Dest >> Src
    xorq Src, Dest Dest = Dest ^ Src
    andq Src, Dest Dest = Dest & Src
    orq Src, Dest Dest = Dest | Src
  • Watch out for argument order!

  • No distinction between signed and unsigned int (why?)

  • One Operand Instructions

    incq Dest Dest = Dest + 1

    decq Dest Dest = Dest - 1

    negq Dest Dest = -Dest

    notq Dest Dest = ~Dest

  • See book for more instructions

Summary

  • History of Intel processors and architectures
    • Evolutionary design leads to many quirks and artifacts
  • C, assembly, machine code
    • New forms of visible state: program counter, registers, …
    • Complier must transform statements, expressions, procedures into low-level instruction sequences
  • Assembly Basics: Register, operands, move
    • The x86-64 move instructions cover wide range of data movement forms
  • Arithmetic
    • C compiler will figure out different instruction combinations to carry out computation

Machine-Level Programming II: Control

Control: Condition codes

  • Information about currently executing program
    • Temporary data (%rax, …)
    • Location of runtime stack (%rsp)
    • Location of current code control point (%rip, …)
    • Status of recent tests (CF, ZF, SF, OF)

condition Code (Implicit Setting)

  • Single bit registers

    • CF Carry Flag (for unsigned)
    • SF Sign Flag (for signed)
    • ZF Zero Flag
    • OF Overflow Flag (for signed)
  • Implicitly set (think of it as side effect) by arithmetic operations

    • Example: addq Src, Dest -> t = a+b

    • CF set if carry out from most significant bit (unsigned overflow)

    • ZF set if t == 0

    • SF set if t < 0 (as signed)

    • Of set if two’s -complement (signed) overflow

      (a > 0 && b > 0 && t < 0) || (a < 0 && b < 0 && t >= 0)

  • Not set by leaq instruction

  • Explicit Setting by Compare Instruction

    • cmpq Src2, Src1

    • cmpq b, a like computing a-b without setting destination

    • CF set if carry out from most significant bit (used for unsigned comparisons)

    • ZF set if a == b

    • SF set if (a-b) < 0 (as signed)

    • OF set if two’s -complement (signed) overflow

      (a > 0 && b > 0 && (a-b) < 0) || (a < 0 && b < 0 && (a-b) >= 0)

  • Explicit Seting by Test instruction

    • testq Src2, Src1

      • testq b, a like computing a&b without setting destination
    • Sets condition codes based on value of Src1 & Src2

    • Useful to have one of the operands be a mask

    • ZF set when a&b == 0

    • SF set when a&b < 0

  • SetX Instructions

    • Set low-order byte of destination to 0 or 1 based on combinations of condition codes

    • Does not alter remaining 7 bytes

      SetX Condition Description
      sete ZF Equal / Zero
      setne ~ZF Not Equal / Not Zero
      sets SF Negative
      setns ~SF Nonnegative
      setg ~(SF^OF) & ~ZF Greater (Signed)
      setge ~(SF^OF) Greater or Equal (Signed)
      setl (SF^OF) Less (Signed)
      setle (SF^OF) | ZF Less or Equal (Signed)
      seta CF&ZF Above (unsigned)
      setb CF Below (unsigned)

Reading Condition Codes (Cont.)

  • SetX Instructions:

    • Set single byte based on combination of condition codes
  • One of addressable byte registers

    • Does not alter remaining bytes

    • Typically use movzbl to finish job

      • 32-bit instruction also set upper 32bits to 0
      int gt (long x, long y) 
      {
          return x > y;
      }
      
      # %rdi is x, %rsi is y, %rax is Return value
      cmpq	%rsi, %rdi		# Compare x:y
      setg	%al			# Set when >
      movzbl	%al, %eax	# Zero rset of %rax
      ret
      

Conditional branches

Jumping

  • jX Instructions

    • Jump to different part of code depending on condition codes

      jX Condition Description
      jmp 1 Unconditional
      je ZF Equal / Zero
      jne ~ZF Not Equal / Not Zero
      js SF Negative
      jns ~SF Nonnegative
      jg ~(SF^OF) & ~ZF Greater (Signed)
      jge ~(SF^OF) Greater or Equal (Signed)
      jl (SF^OF) Less (Signed)
      jle (SF^OF) | ZF Less or Equal (Signed)
      ja ~CF & ZF Above (unsigned)
      jb CF Below (unsigned)

Conditional Branch Example (Old Style)

  • Generation

    shark> gcc -Og -S -fno-if-conversion control.c

    long absdiff (long x, long y)
    {
        long result;
        if (x > y)
            result = x-y;
        else
            result = y-x;
        return result;
    }
    
    # x in %rdi, y in %rsi
    absdiff:
    	cmpq	%rsi, %rdi		# x:y
    	jle		.L4
    	movq	%rdi, %rax
    	subq	%rsi, %rax
    	ret
    .L4:		# x <= y
    	movq	%rsi, %rax
    	subq	%rdi, %rax
    	ret
    

Expressing with Goto Code

  • C allows goto statement

  • Jump to position designated by label

    long absdiff (long x, long y)
    {
        long result;
        if (x > y)
            result = x-y;
        else
            result = y-x;
        return result;
    }
    
    long absdiff_j (long x, long y)
    {
        long result;
        int ntest = x <= y;
        if (ntest) goto Else;
        result = x-y;
        goto Done;
        Else:
            result = y-x;
        Done:
        	return result;
    }
    

General Conditional Expression Translation (Using Branches)

C code

val = Test ? Then_Expr : Else_Expr;

Goto Version

	ntest = !Test;
	if (ntest) goto Else;
	val = Then_Expr;
	goto Done;
Else:
	val = Else_Expr;
Done:
	...
  • Create separate code regions for then & else expressions
  • Execute appropriate one

Using Conditional Moves

  • Conditional Move Instructions

    • Instruction supports:

      if (Test) Dest <- Src

    • Supported in post-1995 x86 processors

    • GCC tries to use them

      • But, only when know to be safe
  • Why?

    • Branches are very disruptive to instruction flow through pipelines
    • Conditional moves do not require control transfer
val = Test ? Then_Expr : Else_Expr;
result = Then_Expr;
eval = Else_Expr;
nt = !Test;
if (nt) result = eval;
return result;

Conditional Move Example

long absdiff (long x, long y)
{
    long result;
    if (x > y)
        result = x-y;
    else
        result = y-x;
    return result;
}
# x in %rdi, y in %rsi
absdiff:
	movq	%rdi, %rax	# x
	subq	%rsi, %rax	# result = x-y
	movq	%rsi, %rdx
	subq	%rdi, %rdx	# eval = y-x
	cmpq	%rsi, %rdi	# x:y
	cmovle	%rdx, %rax	# if <=, result = eval
	ret

Bad Cases for Conditional Move

Expensive Computations

val = Test(x) ? Hard1(x) : Hard2(x);
  • Both values get computed
  • Only makes sense when computations are very simple

Risky Computations

val = p ? *p : 0;
  • Both values get computed
  • May have undesirable effects

Computations with side effects

val = x > 0 ? x*=7 : x+=3;
  • Both values get computed
  • Must be side-effect free

Loops

“Do-While” Loop Example

C code

long pcount_do (unsigned long x)
{
    long result = 0;
    do {
        result += x & 0x1;
        x >>= 1;
    } while (x);
    return result;
}

Goto Version

long pcount_goto (unsigned long x)
{
    long result = 0;
loop:
	result += x & 0x1;
    x >>= 1;
    if (x) goto loop;
    return result;
}
  • count number of 1’s in argument x (“popcount”)
  • Use conditional branch to either continue looping or to exit loop

“Do-While” Loop Compilation

# x in %rdi
movl	$0, %eax	# result = 0
.L2:
movq 	%rdi, %rax	# loop:
andl	$1, %edx	# t = x & 0x1
addq	%rdx, %rax	# result += t
shrq	%rdi		# x >>= 1
jne		L2			# if (x)
rep; ret

General “Do-While” Translation

C Code

do
	Body
	while (Test);

Goto Version

loop:
	Body
    if (Test)
    goto loop
  • Body: {

    ​ Statement1;

    ​ Statement2;

    ​ …

    ​ Statementn;

    }

General “While” Translation #1

  • “Jump-to-middle” translation
  • Used with -Og

While version

while (Test)
    Body

Goto Version

	goto test;
loop:
	Body
test:
	if (Test)
        goto loop;
done:

While Loop Example #1

C Code

long pcount_while (unsigned long x)
{
    long result = 0;
    while (x) {
        result += x & 0x1;
        x >>= 1;
    }
    return result;
}

Jump to Middle

long pcount_goto_jtm (unsigned long x)
{
    long result = 0;
    goto test;
loop:
    result += x & 0x1;
    x >>= 1;
test:
    if (x) goto loop;
    return result;
}
  • Compare to do-while version of function
  • Initial goto starts loop at test

General “While” Translation #2

While Version

while (Test)
    Body

Do-While Version

if (!Test)
    goto done;
do
    Body
    while (Test);
done:

Goto Version

if (!Test)
    goto done;
loop:
	Body
    if (Test)
        goto loop;
done:
  • “Do-while” conversion
  • Used with -O1

While Loop Example #2

C Code

long pcount_while (unsigned long x)
{
    long result = 0;
    while (x) {
        result += x & 0x1;
        x >>= 1;
    }
    return result;
}

Do-While Version

long pcount_goto_dw (unsigned long x)
{
    long result = 0;
    if (!x) goto done;
loop:
    result += x & 0x1;
    x >>= 1;
    if (x) goto loop;
done:
    return result;
}
  • Compare to do-while version of function
  • Initial conditional guards entrance(入口) to loop

“For” Loop Form

General Form

for (Init; Test; Update)
    Body
#define WSIZE 8*sizeof(int)
long pcount_for (unsigned long x)
{
    size_t i;
    long result = 0;
    for (i = 0; i < WSIZE; i++)
    {
        unsigned bit = (x >> i) & 0x1;
        result += bit;
    }
    return result;
}

Init

i = 0

Test

i < WSIZE

Update

i++

Body

{
    unsigned bit = (x >> i) & 0x1;
	result += bit;
}

“For” Loop -> While Loop

For Version

for (Init; Test; Update)
	Body

While Version

Init;
while (Test) {
    Body
    Update;
}

For-While Conversion

long pcount_for_while (unsigned long x)
{
    size_t i;
    long result = 0;
    i = 0;
    while (i < WSIZE)
    {
        unsigned bit = (x >> i) & 0x1;
        result += bit;
        i++;
    }
    return result;
}

“For” Loop Do-While Conversion

C Code

long pcount_for (unsigned long x)
{
    size_t i;
    long result = 0;
    for (i = 0; i < WSIZE; i++)
    {
        unsigned bit = (x >> i) & 0x1;
        result += bit;
    }
    return result;
}

Goto Version

long pcount_for (unsigned long x)
{
    size_t i;
    long result = 0;
    i = 0;	//Init
loop:
    {	//Body
        unsigned bit = (x >> i) & 0x1;	
        result += bit;
    }
    i++;	//Update
    if (i < WSIZE)	//Test
        goto loop;
done:
    return result;
}
  • Init test can be optimized away

Switch Statements

Switch Statement Example

long switch_eg (long x, long y, long z)
{
    long w = 1;
    switch (x) {
    case 1:
    	e = y*z;
        break;
   	case 2:
    	w = y/z;
        // Fall Through
    case 3:
        w += z;
        break;
    case 5:
    case 6:
        w -= z;
        break;
    default:
    	w = 2;
    }
}
  • Multiple case labels
    • Here: 5 & 6
  • Fall through cases
    • Here: 2
  • Missing cases
    • Here: 4

Jump Table Structure

Switch Form

switch (x) {
case val_0:
	Block 0  
case val_1:
	Block 1    
	...
case val_n-1:
	Block n-1     
}    

Jump Table

jtab:
	Targ0
	Targ1
	Targ2
	...
	Targn-1

Jump Targets

Targ0:
	Code Block 0
Targ1:
	Code Block 1
Targ2:
	Code Block 2
...
Targn-1:
	Code Block n-1

Translation (Extended C)

goto *JTab[x];

Switch Statement Example

long switch_eg (long x, long y, long z)
{
    long w = 1;
    switch (x) {
    	...
    }
    return w;
}

Setup:

# x in %rdi, y in %rsi, z in %rdx
switch_eg:
	movq	%rdx, %rcx
	cmpq	$6, %rdi	# x:6
	ja		.L8			# 使用无符号值比较,负数可以视为很大的值 Use default
	jmp		*.L4(, %rdi, 8)	# goto *JTab[x]

What range of values take default?

Jump table

.section	.rodata
	.align 8
.L4
	.quad	.L8	# x = 0
	.quad	.L3	# x = 1
	.quad	.L5	# x = 2
	.quad	.L9	# x = 3
	.quad	.L8	# x = 4
	.quad	.L7	# x = 5
	.quad	.L7	# x = 6

Code Blocks (x == 1)

switch (x) {
case 1:		// .L3
	w = y*z;
    break;
    ...
}
# x in %rdi, y in %rsi, z in %rdx
.L3
	movq	%rsi, %rax	# y
	imulq	%rdx, %rax	# y*z
	ret

Code Blocks (x == 2, x == 3)

switch (x) {
case 2:	
	w = y/z;
    //Fall Through
case 3:	
	w += z;
    break;
    ...
}
# x in %rdi, y in %rsi, z in %rdx
.L5						# Case 2
	movq	$rsi, $rax
	cqto
	idivq	$rcx		# y/z
	jmp		.L6			# goto merge
.L9						# Case 3
	movl	%1, %eax	# w = 1
.L6						# merge:
	addq	%rcx, %rax	# w += z
	ret

Code Blocks (x == 5, x == 6, default)

switch (x) {
case 5:		// .L7
case 6:		// .L7
	w -= z;
    break;
default: 	// .L8
    w = 2
}
# x in %rdi, y in %rsi, z in %rdx
.L7						# Case 5,6
	movql	$1, $eax	# w = 1
	subq	%rdx, %rax	# w -= z
	ret
.L8						# Default:
	movl	$2, %eax	# 2
	ret

Summary

  • C Control
    • if-then-else
    • do-while
    • while, for
    • switch
  • Assembler Control
    • Conditional jump
    • Conditional move
    • Indirect jump (via jump tables)
    • Compiler generates code sequence to implement more complex control
  • Standard Techniques
    • Loop converted to do-while or jump-to-middle form
    • Large switch statements use jump tables
    • Sparse switch statements may use decision trees (if-elseif-elseif-else)
上一篇:SROP例题


下一篇:redis6.0.5之Rax阅读笔记1-相关数据结构和部分辅助函数