6.828 lab1

Booting a PC

Exercise 1. Familiarize yourself with the assembly language materials available on the 6.828 reference page. You don't have to read them now, but you'll almost certainly want to refer to some of this material when reading and writing x86 assembly.

We do recommend reading the section "The Syntax" in Brennan's Guide to Inline Assembly. It gives a good (and quite brief) description of the AT&T assembly syntax we'll be using with the GNU assembler in JOS.


Exercise 2. Use GDB's si (Step Instruction) command to trace into the ROM BIOS for a few more instructions, and try to guess what it might be doing. You might want to look at Phil Storrs I/O Ports Description, as well as other materials on the 6.828 reference materials page. No need to figure out all the details - just the general idea of what the BIOS is doing first.

0xffff0:	ljmp   $0xf000,$0xe05b	# 跳跃到BIOS之前位置

0xfe05b:	cmpl   $0x0,%cs:0x6ac8
0xfe062:	jne    0xfd2e1
0xfe066:	xor    %dx,%dx
0xfe068:	mov    %dx,%ss	# ss寄存器设为0
0xfe06a:	mov    $0x7000,%esp 
0xfe070:	mov    $0xf34c2,%edx
0xfe076:	jmp    0xfd15c

0xfd15c:	mov    %eax,%ecx
0xfd15f:	cli	# 关闭中断
0xfd160:	cld    
0xfd161:	mov    $0x8f,%eax
0xfd167:	out    %al,$0x70
0xfd169:	in     $0x71,%al
0xfd16b:	in     $0x92,%al
0xfd16d:	or     $0x2,%al
0xfd16f:	out    %al,$0x92
0xfd171:	lidtw  %cs:0x6ab8
0xfd177:	lgdtw  %cs:0x6a74		# 加载gdt
0xfd17d:	mov    %cr0,%eax
0xfd180:	or     $0x1,%eax
0xfd184:	mov    %eax,%cr0
0xfd187:	ljmpl  $0x8,$0xfd18f	# 进入real mode

Exercise 3. Take a look at the lab tools guide, especially the section on GDB commands. Even if you're familiar with GDB, this includes some esoteric GDB commands that are useful for OS work.

Set a breakpoint at address 0x7c00, which is where the boot sector will be loaded. Continue execution until that breakpoint. Trace through the code in boot/boot.S, using the source code and the disassembly file obj/boot/boot.asm to keep track of where you are. Also use the x/i command in GDB to disassemble sequences of instructions in the boot loader, and compare the original boot loader source code with both the disassembly in obj/boot/boot.asm and GDB.

Trace into bootmain() in boot/main.c, and then into readsect(). Identify the exact assembly instructions that correspond to each of the statements in readsect(). Trace through the rest of readsect() and back out into bootmain(), and identify the begin and end of the for loop that reads the remaining sectors of the kernel from the disk. Find out what code will run when the loop is finished, set a breakpoint there, and continue to that breakpoint. Then step through the remainder of the boot loader.

几种类型的寄存器:

  • general purpose register: %eax, %ebs, %ecx, %edx, %edi, %esi, %ebp, %esp, %eip
  • Control register: %cr0, %cr2, %cr3, %cr4
  • Debug register: %dr0, %dr1, %dr2, %dr3
  • Segment register: %cs, %ds, %es, %fs, %gs, %ss
  • Global and local descriptor table pseudo-register: %gdtr, %ldtr
  1. boot loader开始地址为0x7c00

  2. 关闭中断. 中断是设备调用的os函数(interrupt handler). 之前BIOS会设置自己的中断来初始化硬件.

    6.828 lab1

  3. 将几个寄存器清零. 此时是real mode, 此模式下有8个16-bit general-register, 但处理器要发送20bit地址给内存, 就要用到segment register: cs(指令), ds(数据), es, ss(stack)来提供额外的bit, 将16bit地址扩展到20bit.

    6.828 lab1

  4. x86指令用的是logical address(由segment selector和offset组成), 通常segment是隐式的, 只使用offset. 通过segment:offset, 可以翻译为linear address. 目前page翻译还不能使用, 就将linear address当作物理地址.

    将logical address翻译为linear address需要segmentation硬件. (xv6中两者相同)

    6.828 lab1

  5. segment:offset会生成21-bit物理地址, 但目前只有A0-A19总线有用, A20默认为0, 要访问1MB以上的地址, 就要将其开启.

    6.828 lab1

  6. real mode只有16-bit的寄存器, 最多只能使用65536 byte内存, 为了使用更多内存, 32-bit的protected mode很有必要. 在protected mode下, segment register保存着对应segment descriptor table的索引. 表中每个条目指定了base physical address, 最大虚拟地址(limit), permission bit.

    6.828 lab1

    boot loader会设置segment dexcriptor table gdt, 通过此表, 可以将logic address转化为linear address.

  7. 通过lgdt指令, 将处理器的gdtr寄存器的值设置为gdtdesc, gdtdesc中保存着gdt的大小和起始位置.

    6.828 lab1

    6.828 lab1

    然后通过在cr0中设置CRO_PE_ON来开启protected mode

  8. 尽管开启了protected mode, 并不意味着会改变处理器将逻辑地址翻译成物理地址的方式. 只有当加载一个新值到segment register, 然后处理器读取gdt并改变内部segment设置.

    由于不能直接修改cs, 可以通过运行ljmp来设置cs指向gdt的代码描述条目, 该条目描述了一个32-bit 代码segment, 这样就切换到了32-bit mode.

    6.828 lab1

  9. 进入32-bit模式的第一件事就是初始化data seg 寄存器. 现在logical address直接映射到物理地址.

    6.828 lab1

  10. 在运行c代码前, 需要在未使用的内存区域(0x7c00, 也就是$start)设置stack, 因为boot loader的范围为0x7c00-0x7e00(512 byte), 所以0x7c00作为stack的顶部, stack向下增长直到0x0000. 设置完stack后就调用bootmain.

    6.828 lab1

  11. bootmain的任务是从磁盘找到kernel代码(ELF格式). 为了获取ELF header, bootmain加载第一个ELF的前4096(SECTSIZE*8)到内存的0x100000(ELFHDR)处.

    6.828 lab1

  12. 下一步快速检查ELFHDR是否正确

    6.828 lab1

  13. e_phoff保存的是program header table的位置, e_phnum保存program header table的条目数, 从pheph代表将所有program segment加载进内存.

    ph->p_pa保存的是要加载的物理地址, ph->p_memsz保存的是在内存中该segment的大小, ph->p_offset保存的是在文件中, 该segment的位置.

    6.828 lab1

  14. 内核被编译和链接, 通过查看kernel.ld可以看到, 内核的起始地址在虚拟地址0xf0100000, 这个地址非常大, 几乎是32-bit地址空间的顶部. 但目前还不能将虚拟地址翻译成物理地址, 内核的实际物理地址在0x00100000, kernel.ld指定ELF的paddr从0x00100000开始, 这样boot loader可以将内核复制该地址.

    6.828 lab1

  15. 最后, boot loader调用内核的entry point(内核开始运行的地址), 该地址是0x10000c.

    6.828 lab1

    6.828 lab1


  • At what point does the processor start executing 32-bit code? What exactly causes the switch from 16- to 32-bit mode?

boot.S的第55行调用ljmp

  • What is the last instruction of the boot loader executed, and what is the first instruction of the kernel it just loaded?

main.c第60行调用e_entry跳跃到0x10000c

6.828 lab1

通过kernel.asm查看到0x10000c的指令为movw $0x1234,0x472

  • Where is the first instruction of the kernel?

0x10000c

  • How does the boot loader decide how many sectors it must read in order to fetch the entire kernel from disk? Where does it find this information?

读取ELF header, 然后通过ELF header中的信息, e_phoff定位到program header table的初始位置, 通过e_phnum知道program header table有多少个entry.


Exercise 4. Read about programming with pointers in C. The best reference for the C language is The C Programming Language by Brian Kernighan and Dennis Ritchie (known as 'K&R'). We recommend that students purchase this book (here is an Amazon Link) or find one of MIT's 7 copies.

Read 5.1 (Pointers and Addresses) through 5.5 (Character Pointers and Functions) in K&R. Then download the code for pointers.c, run it, and make sure you understand where all of the printed values come from. In particular, make sure you understand where the pointer addresses in printed lines 1 and 6 come from, how all the values in printed lines 2 through 4 get there, and why the values printed in line 5 are seemingly corrupted.

There are other references on pointers in C (e.g., A tutorial by Ted Jensen that cites K&R heavily), though not as strongly recommended.

Warning: Unless you are already thoroughly versed in C, do not skip or even skim this reading exercise. If you do not really understand pointers in C, you will suffer untold pain and misery in subsequent labs, and then eventually come to understand them the hard way. Trust us; you don't want to find out what "the hard way" is.

难点在于第5行输出

c = (int *) ((char *) c + 1);这条语句先将c转化为char指针, 然后加1会将其向后移动1byte, 再将其转化回int指针.

给它打上断点, 在这条语句前输出c的地址为0x7ffee842c8b4, 观察内存

6.828 lab1

运行完这条语句后, c的地址为0x7ffee842c8b5, 这时再给它赋值会发现不仅改变了a[1], 还改变了a[2]

6.828 lab1

6.828 lab1


Exercise 5. Trace through the first few instructions of the boot loader again and identify the first instruction that would "break" or otherwise do the wrong thing if you were to get the boot loader's link address wrong. Then change the link address in boot/Makefrag to something wrong, run make clean, recompile the lab with make, and trace into the boot loader again to see what happens. Don't forget to change the link address back and make clean again afterward!

先将boot/Makefrag中的0x7C00修改为任意一个其他数. 可以看到.text section的VMA确实被修改了.

6.828 lab1

但由于BIOS是ROM, 会默认会将boot loader加载到0x7c00, 可以gdb打个断点验证下.

6.828 lab1

前面几步都没问题, 但是运行到0x7c1e问题就来了, lgdtw 0x7e64, 这条指令将gdtdesc写入gdtr. gdtdesc保存的是gdt的物理地址以及大小.

6.828 lab1

gdtdesc本身是物理地址, 由于刚刚修改了Makefrag, 当前gdtdesc的值为0x7e64, 打印看下该地址内容, 都是0, 这明显是不对的, 因为看到上面gdtdesc中至少.word为0x17, 不为0.

6.828 lab1

如果没有修改Makefrag, gdtdesc的内容是什么呢, 未修改前, gdtdesc的值为0x7c64, 打印该处的值, 可以看到, 出现了0x17.

6.828 lab1


Exercise 6. We can examine memory using GDB's x command. The GDB manual has full details, but for now, it is enough to know that the command x/Nx ADDR prints N words of memory at ADDR. (Note that both 'x's in the command are lowercase.) Warning: The size of a word is not a universal standard. In GNU assembly, a word is two bytes (the 'w' in xorw, which stands for word, means 2 bytes).

Reset the machine (exit QEMU/GDB and start them again). Examine the 8 words of memory at 0x00100000 at the point the BIOS enters the boot loader, and then again at the point the boot loader enters the kernel. Why are they different? What is there at the second breakpoint? (You do not really need to use QEMU to answer this question. Just think.)

6.828 lab1

6.828 lab1

进入boot loader时还没将内核加载进内存.


Exercise 7. Use QEMU and GDB to trace into the JOS kernel and stop at the movl %eax, %cr0. Examine memory at 0x00100000 and at 0xf0100000. Now, single step over that instruction using the stepi GDB command. Again, examine memory at 0x00100000 and at 0xf0100000. Make sure you understand what just happened.

What is the first instruction after the new mapping is established that would fail to work properly if the mapping weren't in place? Comment out the movl %eax, %cr0 in kern/entry.S, trace into it, and see if you were right.

movl %eax, %cr0时, entry_pgdir还未被开启(CR0_PG还没被设置), 也就是说, 地址翻译还没有开始, 0xf0100000地址被当成物理地址, 由于内存没有这么大, 所以理所当然该地址处的值为0.

6.828 lab1

si后, entry_pgdir正式开启, entry_pgdir会将0xf0000000-0xf0400000处虚拟地址一一映射到0x00000000-0x00400000处物理地址, 而0x00000000-0x00400000处虚拟地址不变..

6.828 lab1

除了这两块地址之外的其他虚拟地址, entry_pgdir还没有将其与物理地址对应.

6.828 lab1


Exercise 8. We have omitted a small fragment of code - the code necessary to print octal numbers using patterns of the form "%o". Find and fill in this code fragment.

		// (unsigned) octal
		case 'o':
			// Replace this with your code.
			//putch('X', putdat);
			//putch('X', putdat);
			//putch('X', putdat);
      num = getuint(&ap, lflag);
      base = 8;
      goto number;

  1. Explain the interface between printf.c and console.c. Specifically, what function does console.c export? How is this function used by printf.c?

printf.c中的putch通过调用console.c中的cputchar实现.

该函数作为一个指针传递给cvprintf中的vprintfmt

  1. Explain the following from console.c:
if (crt_pos >= CRT_SIZE) {
  int i;
  memmove(crt_buf, crt_buf + CRT_COLS, (CRT_SIZE - CRT_COLS)*sizeof(uint16_t));
  for (i = CRT_SIZE - CRT_COLS; i < CRT_SIZE; i++)
    crt_buf[i] = 0x0700 | ' ';
  crt_pos -= CRT_COLS;
}

crt_pos比console大时, 将超出的部分重开一行.

  1. For the following questions you might wish to consult the notes for Lecture 2. These notes cover GCC's calling convention on the x86.

    Trace the execution of the following code step-by-step:

    int x = 1, y = 3, z = 4;
    cprintf("x %d, y %x, z %d\n", x, y, z);
    
    • In the call to cprintf(), to what does fmt point? To what does ap point?
    • List (in order of execution) each call to cons_putc, va_arg, and vcprintf. For cons_putc, list its argument as well. For va_arg, list what ap points to before and after the call. For vcprintf list the values of its two arguments.

6.828 lab1

fmt指向cprintf的第一个参数也就是"x %d, y %x, z %d\n", ap指向后面的变量长度的参数第第一个参数地址, 也就是x的地址.

  1. Run the following code.

    unsigned int i = 0x00646c72;
    cprintf("H%x Wo%s", 57616, &i);
    

He110 Wrold

57616的16进制是0x110, 0x00646c72分为00(NULL), 64(‘d’), 6c(‘l’), 72(‘r’)

如果是大端的话, 就要将i改为0x726c6400

  1. In the following code, what is going to be printed after 'y='? (note: the answer is not a specific value.) Why does this happen?

    cprintf("x=%d y=%d", 3);
    

6.828 lab1


Exercise 9. Determine where the kernel initializes its stack, and exactly where in memory its stack is located. How does the kernel reserve space for its stack? And at which "end" of this reserved area is the stack pointer initialized to point to?

kernel.asm中, 栈顶为0xf010f000.

6.828 lab1

memlayout.h中给出了详细的图

6.828 lab1


Exercise 10. To become familiar with the C calling conventions on the x86, find the address of the test_backtrace function in obj/kern/kernel.asm, set a breakpoint there, and examine what happens each time it gets called after the kernel starts. How many 32-bit words does each recursive nesting level of test_backtrace push on the stack, and what are those words?

Note that, for this exercise to work properly, you should be using the patched version of QEMU available on the tools page or on Athena. Otherwise, you'll have to manually translate all breakpoint and memory addresses to linear addresses.


Exercise 11. Implement the backtrace function as specified above. Use the same format as in the example, since otherwise the grading script will be confused. When you think you have it working right, run make grade to see if its output conforms to what our grading script expects, and fix it if it doesn't. After you have handed in your Lab 1 code, you are welcome to change the output format of the backtrace function any way you like.

If you use read_ebp(), note that GCC may generate "optimized" code that calls read_ebp() before mon_backtrace()'s function prologue, which results in an incomplete stack trace (the stack frame of the most recent function call is missing). While we have tried to disable optimizations that cause this reordering, you may want to examine the assembly of mon_backtrace() and make sure the call to read_ebp() is happening after the function prologue.

int
mon_backtrace(int argc, char **argv, struct Trapframe *tf)
{
  cprintf("Stack backtrace:\n");
	// Your code here.
  uint32_t  *ebp = (uint32_t *)read_ebp();
  uint32_t return_address;

  while (ebp) {
    return_address = *(ebp + 1);
    cprintf("  ebp %08x eip %08x args %08x %08x %08x %08x %08x\n",
            ebp, return_address, *(ebp + 2), *(ebp + 3), *(ebp + 4), *(ebp + 5), *(ebp + 6));
    ebp = (uint32_t *)(*ebp);
  }
	return 0;
}

Exercise 12. Modify your stack backtrace function to display, for each eip, the function name, source file name, and line number corresponding to that eip.

In debuginfo_eip, where do __STAB_* come from? This question has a long answer; to help you to discover the answer, here are some things you might want to do:

  • look in the file kern/kernel.ld for __STAB_*
  • run objdump -h obj/kern/kernel
  • run objdump -G obj/kern/kernel
  • run gcc -pipe -nostdinc -O2 -fno-builtin -I. -MD -Wall -Wno-format -DJOS_KERNEL -gstabs -c -S kern/init.c, and look at init.s.
  • see if the bootloader loads the symbol table in memory as part of loading the kernel binary

Complete the implementation of debuginfo_eip by inserting the call to stab_binsearch to find the line number for an address.

Add a backtrace command to the kernel monitor, and extend your implementation of mon_backtrace to call debuginfo_eip and print a line for each stack frame of the form:

K> backtrace
Stack backtrace:
  ebp f010ff78  eip f01008ae  args 00000001 f010ff8c 00000000 f0110580 00000000
         kern/monitor.c:143: monitor+106
  ebp f010ffd8  eip f0100193  args 00000000 00001aac 00000660 00000000 00000000
         kern/init.c:49: i386_init+59
  ebp f010fff8  eip f010003d  args 00000000 00000000 0000ffff 10cf9a00 0000ffff
         kern/entry.S:70: <unknown>+0
K> 

Each line gives the file name and line within that file of the stack frame's eip, followed by the name of the function and the offset of the eip from the first instruction of the function (e.g., monitor+106 means the return eip is 106 bytes past the beginning of monitor).

Be sure to print the file and function names on a separate line, to avoid confusing the grading script.

Tip: printf format strings provide an easy, albeit obscure, way to print non-null-terminated strings like those in STABS tables. printf("%.*s", length, string) prints at most length characters of string. Take a look at the printf man page to find out why this works.

You may find that some functions are missing from the backtrace. For example, you will probably see a call to monitor() but not to runcmd(). This is because the compiler in-lines some function calls. Other optimizations may cause you to see unexpected line numbers. If you get rid of the -O2 from GNUMakefile, the backtraces may make more sense (but your kernel will run more slowly).

int
mon_backtrace(int argc, char **argv, struct Trapframe *tf)
{
  cprintf("Stack backtrace:\n");
	// Your code here.
  uint32_t  *ebp = (uint32_t *)read_ebp();
  uint32_t return_address;

  while (ebp) {
    return_address = *(ebp + 1);
    cprintf("  ebp %08x eip %08x args %08x %08x %08x %08x %08x\n",
            ebp, return_address, *(ebp + 2), *(ebp + 3), *(ebp + 4), *(ebp + 5), *(ebp + 6));
    struct Eipdebuginfo info;
    debuginfo_eip((uintptr_t) return_address, &info);
    cprintf("        %s:%d: ", info.eip_file, info.eip_line);
    cprintf("%.*s", info.eip_fn_namelen, info.eip_fn_name);
    cprintf("+%d\n", return_address - info.eip_fn_addr);

    ebp = (uint32_t *)(*ebp);
  }
	return 0;
}
上一篇:ICC 图文学习——LAB1:Data Setup 数据设置


下一篇:MIT6.828 lab1 exercise4~6