Machine Programming I
Machine Programming I: Basic
History of Intel processors and architectures
C, assembly, machine code
Definition
-
Architecture: also ISA instruction set architecture) The parts of a processor design that one needs to understand or write assembly/machine code.
- Example: instruction set specification, registers.
-
MicroArchitecture: Implementation of the architecure.
- Example: cache sizes and core frequency.
-
Code Forms:
- Machine Code: The byte-level programs that a processor executes.
- Assembly Code: A text representation of machine code
-
Example ISAs:
- Intel: x86, IA32, Itanium, x86-64
- ARM: Used in almost all mobile phones
Programmer-Visible State
-
PC: Program counter
- Address of next instruction
- Called “RIP” (x86-64)
-
Register file
- Heavily used program data
-
Condition codes
- Store status information about most recent architecture or logical operation
- Used for conditional branching
-
Memory
- Byte addressable array
- Code and user data
- Stack to support procedures
Turning C into Object Code
- Code in file p1.c p2.c
- Compile with command:
gcc -Og p1.c p2.c -o p
- Use basic optimizations(-Og) [New to recent version of GCC]
- Put resulting binary in file p
Compiling Into Assembly
C Code (sum.c)
long plus(long x, long y);
void sumstore(long x, long y, long *dest)
{
long t = plus(x, y);
*dest = t;
}
Generated x86-64 Assembly
sumstore:
pushq %rbx
movq %rdx, %rbx
call plus
movq %rax, (%rbx)
popq %rbx
ret
Obtain(on shark machine) with command
gcc -Og -S sum.c
Produces file sum.s
Assembly Characteristics: Data Types
-
"Integers" data of 1, 2, 4 or 8 bytes
- Data value
- Addresses(untyped pointers)
- Floating point data of 4, 8 or 10 bytes
- Code: Byte sequences encoding series of instructions
-
No aggregate types such as arrays or structures
- Just contiguously allocated bytes in memory
Assembly Characteristics: Operations
-
Perform arithmetic function on register or memory data
-
Transfer data between memory and register
- Load data from memory into register
- Store register data into memory
-
Transfer control
- Unconditional jumps to/from procedures
- Conditional branches
Object Code
-
Assembler
- Translates .s into .o
- Binary encoding of each instruction
- Nearly-complete image of executable code
- Missing linkages between code in different files
-
Linker
- Resolves references between files
- Combines with static run-time libraries
- E.g., code for malloc, printf
- Some libraries are dynamically linked
- Linking occurs when program begin execution
Machine Instruction Example
-
C code
*dest = t;
- Store value
t
where designated bydest
- Store value
-
Assambly
movq %rax, (%rbx)
-
Move 8-byte value to memory
- Quad words in x86-64 parlance
-
Operands:
t: Register %rax
dest: Register %rbx
*dest: Memory M[%rbx]
-
-
Object Code
0x40059e: 48 89 03
- 3-byte instruction
- Stored at address 0x40059e
Disassembling Object Code
- Disassembled
0000000000400595 <sumstore>:
400595: 53 push %rbx
400596: 48 89 d3 mov %rdx, %rbx
400599: e8 f2 ff ff ff callq 400590 <plus>
40059e: 48 89 03 mov %rax, (%rbx)
4005a1: 5b pop %rbx
4005a2: c3 retq
-
Disassembler
objdump -d sum
- Useful tool for examining object code
- Analyzes bit pattern of series of instructions
- Produces approximate rendition of assembly code
- Can be run on either
a.out
or.o
file
Alternate Disassembly
-
Within gdb Debugger
gdb sum
disassemble sumstore
-
Disassemble procedure
x/14xb sumstore
-
Examine the 14 bytes starting at sumstore
-
What Can be Disassembled?
- Anything that can be interpreted as executable code
- Disassembler examines bytes and reconstructs assembly
Assembly Basics: Registers, operands, move
Moving Data
-
Moving Data
moveq Source, Dest:
-
Operand Types
-
Immediate: Constant integer data
- Example: $0x400, $-533
- Like C constant, but prefixed with ‘$’
- Encoded with 1, 2, or 4 bytes
-
Register: One of 16 integer registers
- Example: %rax, %r13
- But %rsp reserved for special use
- Others have special uses for particular instructions
-
Memory: 8 consecutive bytes of memory at address given by register
- Simplest example: (%rax)
- Various other “address modes”
-
Immediate: Constant integer data
moveq Operand Combinations
Source -> Dest | Src, Dest | C Analog |
---|---|---|
Imm -> Reg | moveq $0x4, %rax |
temp = 0x4; |
Imm -> Mem | moveq %-147, (%rax) |
*p = -147; |
Reg -> Reg | moveq %rax, %rdx |
temp2 = temp1; |
Reg -> Mem | moveq %rax, (%rdx) |
*p = temp; |
Mem -> Reg | moveq (%rax), rdx |
temp = *p; |
Simple Memory Addressing Modes
-
Normal ® Mem[Reg[R]]
-
Register R specifies memory address
-
Aha! Pointer dereferencing in C
moveq (%rcx), %rax
-
-
Displacement D® Mem[Reg[R]+D]
-
Register R specifies start of memory region
-
Constant displacement D specifies offset
moveq 8(%rbp), %rdx
-
Complete Memory Addressing Modes
-
Most General Form D(Rb, Ri, S) Mem[Reg[Rb]+S*Reg[Ri]+D]
- D: Constant “displacement” 1, 2 or 4 bytes
- Rb: Base register: Any of 16 integer registers
- Ri: Index register: Any, except for %rsp
- S: Scale: 1, 2, 4, or 8 (why these numbers?)
-
Special Cases
(Rb, Ri) Mem[Reg[Rb]+Reg[Ri]]
D(Rb, Ri) Mem[Reg[Rb]+Reg[Ri]+D]
(Rb, Ri, S) Mem[Reg[Rb]+S*Reg[Ri]]
Address Computation Instruction
-
leaq Src, Dst
- Src is address mode expression
- Set Dst to address denoted by expression
-
Uses
- Computing addresses without a memory reference
- E.g., translation of
p = &x[i];
- E.g., translation of
- Computing arithmetic expressions of the form x + k*y
- k = 1, 2, 4, or 8
- Computing addresses without a memory reference
-
Example
long m12(long x) { return x*12; }
leaq (%rdi, %rdi, 2), %rax # t <- x + x*2 salq $2, %rax # return t
Some Arithmetic Operations
-
Two Operand Instructions:
Format Computation addq Src, Dest
Dest = Dest + Src
subq Src, Dest
Dest = Dest - Src
imulq Src, Dest
Dest = Dest * Src
salq Src, Dest
Dest = Dest << Src
sarq Src, Dest
Dest = Dest >> Src
shrq Src, Dest
Dest = Dest >> Src
xorq Src, Dest
Dest = Dest ^ Src
andq Src, Dest
Dest = Dest & Src
orq Src, Dest
Dest = Dest | Src
-
Watch out for argument order!
-
No distinction between signed and unsigned int (why?)
-
One Operand Instructions
incq Dest Dest = Dest + 1
decq Dest Dest = Dest - 1
negq Dest Dest = -Dest
notq Dest Dest = ~Dest
-
See book for more instructions
Summary
-
History of Intel processors and architectures
- Evolutionary design leads to many quirks and artifacts
-
C, assembly, machine code
- New forms of visible state: program counter, registers, …
- Complier must transform statements, expressions, procedures into low-level instruction sequences
-
Assembly Basics: Register, operands, move
- The x86-64 move instructions cover wide range of data movement forms
-
Arithmetic
- C compiler will figure out different instruction combinations to carry out computation
Machine-Level Programming II: Control
Control: Condition codes
-
Information about currently executing program
- Temporary data (%rax, …)
- Location of runtime stack (%rsp)
- Location of current code control point (%rip, …)
- Status of recent tests (CF, ZF, SF, OF)
condition Code (Implicit Setting)
-
Single bit registers
- CF Carry Flag (for unsigned)
- SF Sign Flag (for signed)
- ZF Zero Flag
- OF Overflow Flag (for signed)
-
Implicitly set (think of it as side effect) by arithmetic operations
-
Example: addq Src, Dest -> t = a+b
-
CF set if carry out from most significant bit (unsigned overflow)
-
ZF set if t == 0
-
SF set if t < 0 (as signed)
-
Of set if two’s -complement (signed) overflow
(a > 0 && b > 0 && t < 0) || (a < 0 && b < 0 && t >= 0)
-
-
Not set by leaq instruction
-
Explicit Setting by Compare Instruction
-
cmpq Src2, Src1
-
cmpq b, a like computing a-b without setting destination
-
CF set if carry out from most significant bit (used for unsigned comparisons)
-
ZF set if a == b
-
SF set if (a-b) < 0 (as signed)
-
OF set if two’s -complement (signed) overflow
(a > 0 && b > 0 && (a-b) < 0) || (a < 0 && b < 0 && (a-b) >= 0)
-
-
Explicit Seting by Test instruction
-
testq Src2, Src1
- testq b, a like computing a&b without setting destination
-
Sets condition codes based on value of Src1 & Src2
-
Useful to have one of the operands be a mask
-
ZF set when a&b == 0
-
SF set when a&b < 0
-
-
SetX Instructions
-
Set low-order byte of destination to 0 or 1 based on combinations of condition codes
-
Does not alter remaining 7 bytes
SetX Condition Description sete ZF Equal / Zero setne ~ZF Not Equal / Not Zero sets SF Negative setns ~SF Nonnegative setg ~(SF^OF) & ~ZF Greater (Signed) setge ~(SF^OF) Greater or Equal (Signed) setl (SF^OF) Less (Signed) setle (SF^OF) | ZF Less or Equal (Signed) seta CF&ZF Above (unsigned) setb CF Below (unsigned)
-
Reading Condition Codes (Cont.)
-
SetX Instructions:
- Set single byte based on combination of condition codes
-
One of addressable byte registers
-
Does not alter remaining bytes
-
Typically use movzbl to finish job
- 32-bit instruction also set upper 32bits to 0
int gt (long x, long y) { return x > y; }
# %rdi is x, %rsi is y, %rax is Return value cmpq %rsi, %rdi # Compare x:y setg %al # Set when > movzbl %al, %eax # Zero rset of %rax ret
-
Conditional branches
Jumping
-
jX Instructions
-
Jump to different part of code depending on condition codes
jX Condition Description jmp 1 Unconditional je ZF Equal / Zero jne ~ZF Not Equal / Not Zero js SF Negative jns ~SF Nonnegative jg ~(SF^OF) & ~ZF Greater (Signed) jge ~(SF^OF) Greater or Equal (Signed) jl (SF^OF) Less (Signed) jle (SF^OF) | ZF Less or Equal (Signed) ja ~CF & ZF Above (unsigned) jb CF Below (unsigned)
-
Conditional Branch Example (Old Style)
-
Generation
shark> gcc -Og -S -fno-if-conversion control.c
long absdiff (long x, long y) { long result; if (x > y) result = x-y; else result = y-x; return result; }
# x in %rdi, y in %rsi absdiff: cmpq %rsi, %rdi # x:y jle .L4 movq %rdi, %rax subq %rsi, %rax ret .L4: # x <= y movq %rsi, %rax subq %rdi, %rax ret
Expressing with Goto Code
-
C allows goto statement
-
Jump to position designated by label
long absdiff (long x, long y) { long result; if (x > y) result = x-y; else result = y-x; return result; }
long absdiff_j (long x, long y) { long result; int ntest = x <= y; if (ntest) goto Else; result = x-y; goto Done; Else: result = y-x; Done: return result; }
General Conditional Expression Translation (Using Branches)
C code
val = Test ? Then_Expr : Else_Expr;
Goto Version
ntest = !Test;
if (ntest) goto Else;
val = Then_Expr;
goto Done;
Else:
val = Else_Expr;
Done:
...
- Create separate code regions for then & else expressions
- Execute appropriate one
Using Conditional Moves
-
Conditional Move Instructions
-
Instruction supports:
if (Test) Dest <- Src
-
Supported in post-1995 x86 processors
-
GCC tries to use them
- But, only when know to be safe
-
-
Why?
- Branches are very disruptive to instruction flow through pipelines
- Conditional moves do not require control transfer
val = Test ? Then_Expr : Else_Expr;
result = Then_Expr;
eval = Else_Expr;
nt = !Test;
if (nt) result = eval;
return result;
Conditional Move Example
long absdiff (long x, long y)
{
long result;
if (x > y)
result = x-y;
else
result = y-x;
return result;
}
# x in %rdi, y in %rsi
absdiff:
movq %rdi, %rax # x
subq %rsi, %rax # result = x-y
movq %rsi, %rdx
subq %rdi, %rdx # eval = y-x
cmpq %rsi, %rdi # x:y
cmovle %rdx, %rax # if <=, result = eval
ret
Bad Cases for Conditional Move
Expensive Computations
val = Test(x) ? Hard1(x) : Hard2(x);
- Both values get computed
- Only makes sense when computations are very simple
Risky Computations
val = p ? *p : 0;
- Both values get computed
- May have undesirable effects
Computations with side effects
val = x > 0 ? x*=7 : x+=3;
- Both values get computed
- Must be side-effect free
Loops
“Do-While” Loop Example
C code
long pcount_do (unsigned long x)
{
long result = 0;
do {
result += x & 0x1;
x >>= 1;
} while (x);
return result;
}
Goto Version
long pcount_goto (unsigned long x)
{
long result = 0;
loop:
result += x & 0x1;
x >>= 1;
if (x) goto loop;
return result;
}
- count number of 1’s in argument x (“popcount”)
- Use conditional branch to either continue looping or to exit loop
“Do-While” Loop Compilation
# x in %rdi
movl $0, %eax # result = 0
.L2:
movq %rdi, %rax # loop:
andl $1, %edx # t = x & 0x1
addq %rdx, %rax # result += t
shrq %rdi # x >>= 1
jne L2 # if (x)
rep; ret
General “Do-While” Translation
C Code
do
Body
while (Test);
Goto Version
loop:
Body
if (Test)
goto loop
-
Body: {
Statement1;
Statement2;
…
Statementn;
}
General “While” Translation #1
- “Jump-to-middle” translation
- Used with -Og
While version
while (Test)
Body
Goto Version
goto test;
loop:
Body
test:
if (Test)
goto loop;
done:
While Loop Example #1
C Code
long pcount_while (unsigned long x)
{
long result = 0;
while (x) {
result += x & 0x1;
x >>= 1;
}
return result;
}
Jump to Middle
long pcount_goto_jtm (unsigned long x)
{
long result = 0;
goto test;
loop:
result += x & 0x1;
x >>= 1;
test:
if (x) goto loop;
return result;
}
- Compare to do-while version of function
- Initial goto starts loop at test
General “While” Translation #2
While Version
while (Test)
Body
Do-While Version
if (!Test)
goto done;
do
Body
while (Test);
done:
Goto Version
if (!Test)
goto done;
loop:
Body
if (Test)
goto loop;
done:
- “Do-while” conversion
- Used with -O1
While Loop Example #2
C Code
long pcount_while (unsigned long x)
{
long result = 0;
while (x) {
result += x & 0x1;
x >>= 1;
}
return result;
}
Do-While Version
long pcount_goto_dw (unsigned long x)
{
long result = 0;
if (!x) goto done;
loop:
result += x & 0x1;
x >>= 1;
if (x) goto loop;
done:
return result;
}
- Compare to do-while version of function
- Initial conditional guards entrance(入口) to loop
“For” Loop Form
General Form
for (Init; Test; Update)
Body
#define WSIZE 8*sizeof(int)
long pcount_for (unsigned long x)
{
size_t i;
long result = 0;
for (i = 0; i < WSIZE; i++)
{
unsigned bit = (x >> i) & 0x1;
result += bit;
}
return result;
}
Init
i = 0
Test
i < WSIZE
Update
i++
Body
{
unsigned bit = (x >> i) & 0x1;
result += bit;
}
“For” Loop -> While Loop
For Version
for (Init; Test; Update)
Body
While Version
Init;
while (Test) {
Body
Update;
}
For-While Conversion
long pcount_for_while (unsigned long x)
{
size_t i;
long result = 0;
i = 0;
while (i < WSIZE)
{
unsigned bit = (x >> i) & 0x1;
result += bit;
i++;
}
return result;
}
“For” Loop Do-While Conversion
C Code
long pcount_for (unsigned long x)
{
size_t i;
long result = 0;
for (i = 0; i < WSIZE; i++)
{
unsigned bit = (x >> i) & 0x1;
result += bit;
}
return result;
}
Goto Version
long pcount_for (unsigned long x)
{
size_t i;
long result = 0;
i = 0; //Init
loop:
{ //Body
unsigned bit = (x >> i) & 0x1;
result += bit;
}
i++; //Update
if (i < WSIZE) //Test
goto loop;
done:
return result;
}
- Init test can be optimized away
Switch Statements
Switch Statement Example
long switch_eg (long x, long y, long z)
{
long w = 1;
switch (x) {
case 1:
e = y*z;
break;
case 2:
w = y/z;
// Fall Through
case 3:
w += z;
break;
case 5:
case 6:
w -= z;
break;
default:
w = 2;
}
}
- Multiple case labels
- Here: 5 & 6
- Fall through cases
- Here: 2
- Missing cases
- Here: 4
Jump Table Structure
Switch Form
switch (x) {
case val_0:
Block 0
case val_1:
Block 1
...
case val_n-1:
Block n-1
}
Jump Table
jtab:
Targ0
Targ1
Targ2
...
Targn-1
Jump Targets
Targ0:
Code Block 0
Targ1:
Code Block 1
Targ2:
Code Block 2
...
Targn-1:
Code Block n-1
Translation (Extended C)
goto *JTab[x];
Switch Statement Example
long switch_eg (long x, long y, long z)
{
long w = 1;
switch (x) {
...
}
return w;
}
Setup:
# x in %rdi, y in %rsi, z in %rdx
switch_eg:
movq %rdx, %rcx
cmpq $6, %rdi # x:6
ja .L8 # 使用无符号值比较,负数可以视为很大的值 Use default
jmp *.L4(, %rdi, 8) # goto *JTab[x]
What range of values take default?
Jump table
.section .rodata
.align 8
.L4
.quad .L8 # x = 0
.quad .L3 # x = 1
.quad .L5 # x = 2
.quad .L9 # x = 3
.quad .L8 # x = 4
.quad .L7 # x = 5
.quad .L7 # x = 6
Code Blocks (x == 1)
switch (x) {
case 1: // .L3
w = y*z;
break;
...
}
# x in %rdi, y in %rsi, z in %rdx
.L3
movq %rsi, %rax # y
imulq %rdx, %rax # y*z
ret
Code Blocks (x == 2, x == 3)
switch (x) {
case 2:
w = y/z;
//Fall Through
case 3:
w += z;
break;
...
}
# x in %rdi, y in %rsi, z in %rdx
.L5 # Case 2
movq $rsi, $rax
cqto
idivq $rcx # y/z
jmp .L6 # goto merge
.L9 # Case 3
movl %1, %eax # w = 1
.L6 # merge:
addq %rcx, %rax # w += z
ret
Code Blocks (x == 5, x == 6, default)
switch (x) {
case 5: // .L7
case 6: // .L7
w -= z;
break;
default: // .L8
w = 2
}
# x in %rdi, y in %rsi, z in %rdx
.L7 # Case 5,6
movql $1, $eax # w = 1
subq %rdx, %rax # w -= z
ret
.L8 # Default:
movl $2, %eax # 2
ret
Summary
- C Control
- if-then-else
- do-while
- while, for
- switch
- Assembler Control
- Conditional jump
- Conditional move
- Indirect jump (via jump tables)
- Compiler generates code sequence to implement more complex control
- Standard Techniques
- Loop converted to do-while or jump-to-middle form
- Large switch statements use jump tables
- Sparse switch statements may use decision trees (if-elseif-elseif-else)