execise 4略简单,不做了
exercise 5
Basic knowledge from mit6.828 lab1 website :
(6.828上的解释)ELF binary: When you compile and link a C program such as the JOS kernel, the compiler transforms each C source (’.c’) file into an object (’.o’) file containing assembly language instructions encoded in the binary format expected by the hardware. The linker then combines all of the compiled object files into a single binary image such as obj/kern/kernel, which in this case is a binary in the ELF format, (“Executable and Linkable Format”).
An ELF binary starts with a fixed-length ELF header, followed by a variable-length program header listing each of the program sections to be loaded, which include:
.text: The program’s executable instructions.
.rodata: Read-only data, such as ASCII string constants produced by the C compiler. (We will not bother setting up the hardware to prohibit writing, however.)
.data: The data section holds the program’s initialized data, such as global variables declared with initializers like int x = 5;
When the linker computes the memory layout of a program, it reserves space for uninitialized global variables, such as int x;, in a section called .bss that immediately follows .data in memory. C requires that “uninitialized” global variables start with a value of zero. Thus there is no need to store contents for .bss in the ELF binary; instead, the linker records just the address and size of the .bss section. The loader or the program itself must arrange to zero the .bss section.
VMA (virtual address / link address):
The link address of a section is the memory address from which the section expects to execute. The linker encodes the link address in the binary in various ways, such as when the code needs the address of a global variable, with the result that a binary usually won’t work if it is executing from an address that it is not linked for.
LMA (physical address / load address):
The load address of a section is the memory address at which that section should be loaded into memory.
Typically, the link and load addresses are the same.
Unlike the boot loader, these two addresses aren’t the same: the kernel is telling the boot loader to load it into memory at a low address (1 megabyte), but it expects to execute from a high address.
The boot loader uses the ELF program headers to decide how to load the sections. The program headers specify which parts of the ELF object to load into memory and the destination address each should occupy.
The BIOS loads the boot sector into memory starting at address 0x7c00, so this is the boot sector’s load address. This is also where the boot sector executes from, so this is also its link address. We set the link address by passing -Ttext 0x7C00 to the linker in boot/Makefrag, so the linker will produce the correct memory addresses in the generated code.
可以使用objdump -x 命令看到所有headers,LOAD打头的就是需要被加载到内存中的。
exercise 5题目要求:
Trace through the first few instructions of the boot loader again and identify the first instruction that would “break” or otherwise do the wrong thing if you were to get the boot loader’s link address wrong.
Then change the link address in boot/Makefrag to something wrong, run make clean, recompile the lab with make, and trace into the boot loader again to see what happens.
Don’t forget to change the link address back and make clean again afterward!
根据题目提示,BIOS will load boot sector into memory which starts at 0x7c00,打开文件boot/Makefrag,里面有如下一段
$(OBJDIR)/boot/boot: $(BOOT_OBJS)
@echo + ld boot/boot
$(V)$(LD) $(LDFLAGS) -N -e start -Ttext 0x7C00 -o $@.out $^
$(V)$(OBJDUMP) -S $@.out >$@.asm
$(V)$(OBJCOPY) -S -O binary -j .text $@.out $@
$(V)perl boot/sign.pl $(OBJDIR)/boot/boot
首先进行修改,将0x7c00改为比如说0x8900,接下来make clean,然后重新make qemu-gdb,再make gdb,在0x7c00处设置断点,这是因为BIOS会将boot sector默认加载到0x7c00处,然后stepi调试,发现最后instruction卡在了指令ljmp $PROT_MODE_CSEG, $protcseg处如下所示
(gdb)
[ 0:7c2a] => 0x7c2a: mov %eax,%cr0
0x00007c2a in ?? ()
(gdb)
[ 0:7c2d] => 0x7c2d: ljmp $0x8,$0x8932
0x00007c2d in ?? ()
(gdb)
[ 0:7c2d] => 0x7c2d: ljmp $0x8,$0x8932
0x00007c2d in ?? ()
(gdb)
[ 0:7c2d] => 0x7c2d: ljmp $0x8,$0x8932
0x00007c2d in ?? ()
(gdb)
[ 0:7c2d] => 0x7c2d: ljmp $0x8,$0x8932
0x00007c2d in ?? ()
(gdb)
这里发现问题实际上出在了之前的一条指令:
0x7c1e: lgdtw -0x769c
对照未改变链接地址之前的同一条指令:
0x7c1e: lgdtw 0x7c64 //boot.S里对应lgdt gdtdsec
这条指令作用是将0x7c64处的6 byte data加载到GDTR中,这里对于实模式到保护模式的转换非常重要,lgdt指令解释链接如下:
https://www.fermimn.edu.it/linux/quarta/x86/lgdt.htm
打印看一下内部的值。
(gdb) x/6b 0x7c64
0x7c64: 0x17 0x00 0x4c 0x89 0x00 0x00
这也导致了后面ljmp的目标地址出现了错误,具体为什么的内部细节我还没有弄得很清楚,在这里留个疑问。
Exercise 6
we can use “objdump -f obj/kern/kernel
” to see the entry point e_entry, which holds the link address of the entry point in the program: the memory address in the program’s text section at which the program should begin executing.
Exercise 题目要求:
We can examine memory using GDB’s x command. The GDB manual has full details, but for now, it is enough to know that the command x/Nx ADDR
prints N words of memory at ADDR. (Note that both 'x’s in the command are lowercase.) Warning: The size of a word is not a universal standard. In GNU assembly, a word is two bytes (the ‘w’ in xorw, which stands for word, means 2 bytes).
Reset the machine (exit QEMU/GDB and start them again). Examine the 8 words of memory at 0x00100000 at the point the BIOS enters the boot loader, and then again at the point the boot loader enters the kernel. Why are they different? What is there at the second breakpoint? (You do not really need to use QEMU to answer this question. Just think.)
首先设置断点1:break *0x7c00
,查看对应的内容是:
(gdb) x/8x 0x100000
0x100000: 0x00000000 0x00000000 0x00000000 0x00000000
0x100010: 0x00000000 0x00000000 0x00000000 0x00000000
接着设置第二个断点2:break *0x10000c
( first instruction of kernel ):
(gdb) x/8x 0x100000
0x100000: 0x1badb002 0x00000000 0xe4524ffe 0x7205c766
0x100010: 0x34000004 0x0000b812 0x220f0011 0xc0200fd8
产生变化的原因在于boot loader将kernel加载到了内存当中。
输入命令objdump -x obj/kern/kernel
,查看所有header:
Program Header:
LOAD off 0x00001000 vaddr 0xf0100000 paddr 0x00100000 align 2**12
filesz 0x0000716c memsz 0x0000716c flags r-x
LOAD off 0x00009000 vaddr 0xf0108000 paddr 0x00108000 align 2**12
filesz 0x0000a948 memsz 0x0000a948 flags rw-
STACK off 0x00000000 vaddr 0x00000000 paddr 0x00000000 align 2**4
filesz 0x00000000 memsz 0x00000000 flags rwx
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00001917 f0100000 00100000 00001000 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .rodata 00000714 f0101920 00101920 00002920 2**5
CONTENTS, ALLOC, LOAD, READONLY, DATA
2 .stab 00003889 f0102034 00102034 00003034 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
3 .stabstr 000018af f01058bd 001058bd 000068bd 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
4 .data 0000a300 f0108000 00108000 00009000 2**12
CONTENTS, ALLOC, LOAD, DATA
5 .bss 00000648 f0112300 00112300 00013300 2**5
CONTENTS, ALLOC, LOAD, DATA
6 .comment 0000002d 00000000 00000000 00013948 2**0
CONTENTS, READONLY
所以储存在0x100000中的应该是.text段