xv6的课本翻译之——附录B 系统启动器

Appendix B

附录 B

xv6的课本翻译之——附录B 系统启动器

Figure B-1 The relationship between logical, linear, and physical addresses.

图B-1:逻辑地址、线性地址以及物理地址的关系图

The boot loader

系统启动器

When an x86 PC boots, it starts executing a program called the BIOS, which is stored in non-volatile memory on the motherboard. The BIOS’s job is to prepare the hardware and then transfer control to the operating system. Specifically, it transfers control to code loaded from the boot sector, the first 512-byte sector of the boot disk. The boot sector contains the boot loader: instructions that load the kernel into memory. The BIOS loads the boot sector at memory address 0x7c00 and then jumps (sets the processor’s %ip) to that address. When the boot loader begins executing, the processor is simulating an Intel 8088, and the loader’s job is to put the processor in a more modern operating mode, to load the xv6 kernel from disk into memory, and then to transfer control to the kernel. The xv6 boot loader comprises two source files, one written in a combination of 16-bit and 32-bit x86 assembly (bootasm.S; (8900) ) and one written in C (bootmain.c; (9000) ).

当x86PC机加电后,首先执行存储在主板上不易失的内存中的一个叫做bios的程序。BIOS的工作主要是检测硬件并将控制权移交给操作系统。具体的作法是将控制权交给启动扇区(启动磁盘中的第一个扇区)中的代码。启动扇区中包含一个叫启动器的程序:将内核载入到内存。BIOS将启动扇区加载到内存的0x7c00地址处,并跳转(设置处理器的%ip内容)到那个地址开始执行。当启动器开始执行时,处理器将自身模拟成因特尔8088,启动器的工作是将处理器设置成更现代的操作模式,将xv6内核从磁盘加载到内存,然后再将控制权交给内核。xv6的启动器由两个文件组成,一个是使用兼容16位和32位X86汇编语言书写(bootasm.S; (8900) ),另一个使用C语言书写(bootmain.c; (9000) )。

Code: Assembly bootstrap

代码:汇编启动

The first instruction in the boot loader is cli (8912) , which disables processor interrupts. Interrupts are a way for hardware devices to invoke operating system functions called interrupt handlers. The BIOS is a tiny operating system, and it might have set up its own interrupt handlers as part of the initializing the hardware. But the BIOS isn’t running anymore—the boot loader is—so it is no longer appropriate or safe to handle interrupts from hardware devices. When xv6 is ready (in Chapter 3), it will re-enable interrupts.

在启动代码中第一条就是关闭中断的语句cli(8912)。中断是硬件唤醒操作系统的中断处理功能一种方式。BIOS是一个微小的操作系统,它必须设置好自己的中断处理例程并把它做为初始化硬件的部分。但这些中断处理不能运行在其他的程序上,比如启动代码,所以此时对于硬件系统来说这些代码不再是可用的或是不安全的。当xv6准备好自己的系统后,会重新打开中断机制。

The processor is in real mode, in which it simulates an Intel 8088. In real mode there are eight 16-bit general-purpose registers, but the processor sends 20 bits of address to memory. The segment registers %cs, %ds, %es, and %ss provide the additional bits necessary to generate 20-bit memory addresses from 16-bit registers. When a program refers to a memory address, the processor automatically adds 16 times the value of one of the segment registers; these registers are 16 bits wide. Which segment register is usually implicit in the kind of memory reference: instruction fetches use %cs,data reads and writes use %ds, and stack reads and writes use %ss.

此时处理器工作在模拟8088的实模式下。在实模式下它具有8个16位的通用寄存器,但是处理器会发送20位的地址线给内存。段寄存器%cs,%ds,%ed和%ss将提供额外的位来帮助寄存器生成20位的内存地址。当程序引用内存地址时,处理器会自动将段寄存器中的值加16次;这些寄存器是16位宽的。在一类内容地址中段寄存器一般不单独声明:取指令时使用%cs,数据读写使用%ds,栈的读写使用%ss。

Xv6 pretends that an x86 instruction uses a virtual address for its memory operands, but an x86 instruction actually uses a logical address (see Figure B-1). A logical address consists of a segment selector and an offset, and is sometimes written as segment:offset. More often, the segment is implicit and the program only directly manipulates the offset. The segmentation hardware performs the translation described above to generate a linear address. If the paging hardware is enabled (see Chapter2), it translates linear addresses to physical addresses; otherwise the processor uses linear addresses as physical addresses.

Xv6假定X86指令为内存操作使用一个虚拟地址,但是X86指令实际上使用的是逻辑地址(见图B-1)。一个逻辑地址包含一个段选择子和一个偏移地址,有时也写做段:偏移。更多时候,段是不言明的,程序仅仅直接调整偏移量。段的硬件部分按照上面所说的生成一个线性地址。如果分页硬件部分启用(见第二章),它会将线性地址翻译为物理地址;否则处理器将使用线性地址作为物理地址。

The boot loader does not enable the paging hardware; the logical addresses that it uses are translated to linear addresses by the segmentation harware, and then used directly as physical addresses. Xv6 configures the segmentation hardware to translate logical to linear addresses without change, so that they are always equal. For historical reasons we have used the term virtual address to refer to addresses manipulated by programs; an xv6 virtual address is the same as an x86 logical address, and is equal to the linear address to which the segmentation hardware maps it. Once paging is enabled, the only interesting address mapping in the system will be linear to physical.

启动器不能启用分页硬件功能;它使用的逻辑地址被段硬件部分翻译为线性地址,然后直接做为物理地址使用。xv6将段硬件部分配置为直接将逻辑地址翻译为线性地址,这样它们总是相同的。由于历史的原因,我们使用一定规则的地址来引用一个被程序调整后的地址;一个xv6的地址总是和x86的逻辑地址相同,总是等 于段硬件映射的线性地址。一旦分页功能被启用,系统唯一要关心的就是从线性地址到物理地址的映射。

The BIOS does not guarantee anything about the contents of %ds, %es, %ss, so first order of business after disabling interrupts is to set %ax to zero and then copy that zero into %ds, %es, and %ss (8915-8918). A virtual segment:offset can yield a 21-bit physical address, but the Intel 8088 could only address 20 bits of memory, so it discarded the top bit: 0xffff0+0xffff =0x10ffef, but virtual address 0xffff:0xffff on the 8088 referred to physical address 0x0ffef. Some early software relied on the hardware ignoring the 21st address bit, so when Intel introduced processors with more than 20 bits of physical address, IBM provided a compatibility hack that is a requirement for PC-compatible hardware. If the second bit of the keyboard controller’s output port is low, the 21st physical address bit is always cleared; if high, the 21st bit acts normally. The boot loader must enable the 21st address bit using I/O to the keyboard controller on ports 0x64 and 0x60 (8920-8936).

BIOS不保证%ds,%es,%ss的内容,所以在关闭中断后的第一件事情总是设置%ax的内容为零,然后将其内容拷贝到%ds,%es以及%ss(8915-8918)中。一个虚拟的段:偏移地址可产生一个21位的物理地址,但8088仅有20位的内存地址,所以它会丢弃最高位,比如:0xffff0+0xffff = 0x10ffef,但虚拟地址0xffff:0xffff在8088上引用的物理地址却是0x0ffef。一些早期的程序会依赖硬件的这种丢弃第21位地址,所以当因特尔生产超过20位物理地址的处理器时,IBM提供了一个兼容的技巧用来匹配兼容的硬件。如果键盘控制器输出端口的的第二位是低电平,那么第21位的物理地址总是被清除;相反,则第21位会被保留。启动器必须通过使用键盘控制器的0x64和0x60两个IO端口来启用第21位地址。

Real mode’s 16-bit general-purpose and segment registers make it awkward for a program to use more than 65,536 bytes of memory, and impossible to use more than a megabyte. x86 processors since the 80286 have a protected mode, which allows physical addresses to have many more bits, and (since the 80386) a ‘‘32-bit’’ mode that causes registers, virtual addresses, and most integer arithmetic to be carried out with 32 bits rather than 16. The xv6 boot sequence enables protected mode and 32-bit mode as follows.

实模式下的16位设计和段寄存器的使用使得使用超过65536字节以上的内存地址变得很麻烦,因为可能会用到超过1G的内存。从80286开始x86处理器就有了保护模式,这允许物理地址可以表示更多位数,并且(从80386开始)32位的模式,这将引起寄存器,虚拟地址和更大的整数运算将使用32位而不是16位。xv6顺序启动保护模式和32位模式。

In protected mode, a segment register is an index into a segment descriptor table (see Figure B-2). Each table entry specifies a base physical address, a maximum virtual address called the limit, and permission bits for the segment. These permissions are the protection in protected mode: the kernel can use them to ensure that a program uses only its own memory.

在保护模式中,一个段寄存器是一个段描述符表中的一个索引(见图B-2)。每个的表一行指定一个内存物理地址的基址,一个虚拟地址的最大限度,和段的标志位。这些允许规则在保护模式下是被保护的:内核可以使用它们来确保一个程序仅使用自己的内存。

xv6的课本翻译之——附录B 系统启动器

Figure B-2: Segments in protected mode.

图B-2:保护模式下的段

xv6 makes almost no use of segments; it uses the paging hardware instead, as Chapter 2 describes. The boot loader sets up the segment descriptor table gdt (8982-8985) so that all segments have a base address of zero and the maximum possible limit (four gigabytes). The table has a null entry, one entry for executable code, and one entry to data. The code segment descriptor has a flag set that indicates that the code should run in 32-bit mode (0660) . With this setup, when the boot loader enters protected mode, logical addresses map one-to-one to physical addresses.The boot loader executes an lgdt instruction (8941) to load the processor’s global descriptor table (GDT) register with the value gdtdesc (8987-8989) , which points to the table gdt.

xv6几乎不使用分段;正如第二章中所讲述的一样它使用分页来代替这个功能。启动器设置段描述符表GDT以便所有段都有一个相同的0基址和一个可允许的最大的限制。这个表有一个空表项,一个执行代码段的表项,一个数据表项。代码段的描述符有一个标志位假定代码运行在32位模式下(0660)。使用这样的设置,当启动器进行保护模式时,逻辑地址和物理地址将是一对一的映射。启动器执行ldgt语句(8941)来为处理器的全局段描述符表(GDT)寄存器装载gdtdesc(8987-8989)所指向的一个gdt表。

Once it has loaded the GDT register, the boot loader enables protected mode by setting the 1 bit (CR0_PE) in register %cr0 (8942-8944) . Enabling protected mode does not immediately change how the processor translates logical to physical addresses; it is only when one loads a new value into a segment register that the processor reads the GDT and changes its internal segmentation settings. One cannot directly modify %cs, so instead the code executes an ljmp (far jump) instruction (8953) , which allows a code segment selector to be specified. The jump continues execution at the next line (8956) but in doing so sets %cs to refer to the code descriptor entry in gdt. That descriptor describes a 32-bit code segment, so the processor switches into 32-bit mode. The boot loader has nursed the processor through an evolution from 8088 through 80286 to 80386.

一旦GDT寄存器被装载完毕,启动器就会通过设置寄存器%cr0的第一位(CR0_PE)来切换为保护模式(8942-8944)。启用保护模式并不能立即使处理器将逻辑地址翻译为物理地址;仅仅当一个新值被装载地j段寄存器时处理器才会读取GDT表和改变内部的段设置。由于不能直接修改%cs,所以代替的方法是代码中执行一个ljmp(长跳转)指令(8953)。这将允许段寄存器的选择符被指定。跳转后直接执行下一条指令(8956)但%cs寄存器却引用了在gdt中的代码段描述表项。这个描述符指定一个32位的代码段,所以处理器切换到32位模式。启动器将处理器从8088进化到80286再到80386。

The boot loader’s first action in 32-bit mode is to initialize the data segment registers with SEG_KDATA (8958-8961) . Logical address now map directly to physical addresses. The only step left before executing C code is to set up a stack in an unused region of memory. The memory from 0xa0000 to 0x100000 is typically littered with device memory regions, and the xv6 kernel expects to be placed at 0x100000. The boot loader itself is at 0x7c00 through 0x7d00. Essentially any other section of memory would be a fine location for the stack. The boot loader chooses 0x7c00 (known in this file as $start) as the top of the stack; the stack will grow down from there, toward 0x0000, away from the boot loader.

在32位模式下,启动器做的第一件事就是使用SET_KDATA来初始化数据段寄存器(8958-8961)。现在逻辑地址被直接映射为物理地址。在执行C代码前只剩下最后一件事了,设置一段未使用的内存区域来做为栈。从0xa0000到0x100000的这段内存区域是典型的被丢弃的设备内存区域,xv6期望自己被加载到0x100000。启动器自身在0x7c00到0x7d00这段内存中。本质上任何其他的内存区域对于做为栈来说都是好的。启动器选择0x7c00(在文件中被标识为$start)作为栈顶;栈将从这里向下增长,直到0x0000,远离启动器代码。

Finally the boot loader calls the C function bootmain (8968) . Bootmain’s job is to load and run the kernel. It only returns if something has gone wrong. In that case, the code sends a few output words on port 0x8a00 (8970-8976) . On real hardware, there is no device connected to that port, so this code does nothing. If the boot loader is running inside a PC simulator, port 0x8a00 is connected to the simulator itself and can transfer control back to the simulator. Simulator or not, the code then executes an infinite loop (8977-8978) . A real boot loader might attempt to print an error message first.

最后启动器调用C的函数bootmain(8968)。这个函数的工作就是加载并运行内核。它只有出现错误时才会返回。这那种情况下,代码发送很少的信息到0x8a00端口(8970-8976)。在真实的硬件环境中,没有设备使用这个端口,所以发送信息的代码不会启任何作用。如果启动器是运行在一个pc模拟器中,端口0x8a00是模拟器自身并将其发送回模拟器。如果模拟器没有这么做或者没有模拟器,代码将进入无限循环(8977-8978)。一个真正的启动器首先可能会试图去打印一些错误信息。

Code: C bootstrap

代码:C启动

The C part of the boot loader, bootmain.c (9000) , expects to find a copy of the kernel executable on the disk starting at the second sector. The kernel is an ELF format binary, as we have seen in Chapter 2. To get access to the ELF headers, bootmain loads the first 4096 bytes of the ELF binary (9014) . It places the in-memory copy at address 0x10000.

启动器的的C代码部分,bootmain.c(9000),从硬盘的第二个扇区查找内核。正如我们在第二章所见的内核是一个ELF格式的二进制文件。为访问ELF文件头,bootmain加载ELF文件第第一个4096个字节(9014)。将其放置在0x10000内存中。

The next step is a quick check that this probably is an ELF binary, and not an uninitialized disk. Bootmain reads the section’s content starting from the disk location off bytes after the start of the ELF header, and writes to memory starting at address paddr. Bootmain calls readseg to load data from disk (9038) and calls stosb to zero the remainder of the segment (9040) . Stosb (0492) uses the x86 instruction rep stosb to initialize every byte of a block of memory.

第二步是快速检查文件是ELF二进制格式,而不是一个未初始化的磁盘。Bootmain从ELF文件头后面的磁盘定位的扇区内容,将其写入paddr开始听内存区域。Bootmain调用readseg函数来从磁盘上加载数据(9038)并调用stosb将剩余的段置0(9040)。Stosb(0492)使用x86指令rep stosb来初始化内存块中的每一位。

The kernel has been compiled and linked so that it expects to find itself at virtual addresses starting at 0x80100000. Thus, function call instructions must mention destination addresses that look like 0x801xxxxx; you can see examples in kernel.asm. This address is configured in kernel.ld. 0x80100000 is a relatively high address, towards the end of the 32-bit address space; Chapter 2 explains the reasons for this choice. There may not be any physical memory at such a high address. Once the kernel starts executing, it will set up the paging hardware to map virtual addresses starting at 0x80100000 to physical addresses starting at 0x00100000; the kernel assumes that there is physical memory at this lower address. At this point in the boot process, however, paging is not enabled. Instead, kernel.ld specifies that the ELF paddr start at 0x00100000, which causes the boot loader to copy the kernel to the low physical addresses to which the paging hardware will eventually point.

在编译和链接时内核期望自身被加载到0x80100000开始的虚拟地址处。这样,函数调用指令必须被修正为看起来象0x801xxxxx这样;你可以在kernel.asm中看到例子。这些地址在kernel.ld中被设置。0x80100000是一个相对的高地址,向前一直到32位地址空间的末端。第二章中解释了这样做的原因。由于没有物理地址们g于这样高的地址处。一旦内核开始运行,它将设置分页功能去映射虚拟地址从0x80100000到物理地址0x00100000处。内核假定物理内存有这样的低地址。在启动过程中,此时分页没有被启用。代替的做法是,kernel.ld指定ELF的paddr开始于0x00100000,这会导致启动器拷贝内核到低地址处,这里也是分页功能最终转换的地方。

The boot loader’s final step is to call the kernel’s entry point, which is the instruction at which the kernel expects to start executing. For xv6 the entry address is 0x10000c:

启动器的最的g一步是调用 内核的进入点,这里也是内核期望开始运行的指令。对于xv6来说,进入地址是0x10000c:

# objdump -f kernel

kernel:  file format elf32-i386

architecture: i386, flags 0x00000112:

EXEC_P, HAS_SYMS, D_PAGED

start address 0x0010000c

By convention, the _start symbol specifies the ELF entry point, which is defined in the file entry.S (1036) . Since xv6 hasn’t set up virtual memory yet, xv6’s entry point is the physical address of entry (1040) .

按照惯例,_start符号指定ELF进入点,这是在entry.S(1036)文件中定义的。既然xv6还没有设置虚拟内存,xv6的进入点就是物理地址(1040)。

Real world

真实的情况

The boot loader described in this appendix compiles to around 470 bytes of machine code, depending on the optimizations used when compiling the C code. In order to fit in that small amount of space, the xv6 boot loader makes a major simplifying assumption, that the kernel has been written to the boot disk contiguously starting at sector 1. More commonly, kernels are stored in ordinary file systems, where they may not be contiguous, or are loaded over a network. These complications require the boot loader to be able to drive a variety of disk and network controllers and understand various file systems and network protocols. In other words, the boot loader itself must be a small operating system. Since such complicated boot loaders certainly won’t fit in 512 bytes, most PC operating systems use a two-step boot process. First, a simple boot loader like the one in this appendix loads a full-featured boot-loader from a known disk location, often relying on the less space-constrained BIOS for disk access rather than trying to drive the disk itself. Then the full loader, relieved of the 512-byte limit, can implement the complexity needed to locate, load, and execute the desired kernel. Perhaps a more modern design would have the BIOS directly read a larger boot loader from the disk (and start it in protected and 32-bit mode).

索引中所描述的启动器编译后的机器码大约470个字节,这也依赖于所用的C编译器的优化。为了圆整为一个小的空间,xv6启动器做了一个简单的假定,内核被写在启动磁盘从第一个扇区开始的连续扇区中。更普遍的做法是,内核被存储在普通的文件系统中,它们也许不是连接的,或者被通过网络加载。在这些复杂的情况下要求启动器能不同的磁盘和网络控制器并且理解不同文件系统和网络协议。换句话说,启动器本身是一个小的操作系统。既然这么复杂的启动器当然不能仅有512字节大小,大部分pc操作系统使用两步的启动器。首先,一个象索引中描述的这样简单的启动器从已知的磁盘位置载入一个具有全部特性的启动器,通常信赖于BIOS而不是自己来实现磁盘访问。这样当全部启动器,不再有512字节大小限制的版本,可以实现复杂的定位、载入、执行所需要的内核。也可能一个更现代的设计使用BIOS直接从磁盘上读取大的启动器(并在32位的保护模式下启动它)。

This appendix is written as if the only thing that happens between power on and the execution of the boot loader is that the BIOS loads the boot sector. In fact the BIOS does a huge amount of initialization in order to make the complex hardware of a modern computer look like a traditional standard PC.

这个索引是假定在加电并且BIOS载入启动扇区后,执行启动器是唯一的动作。事实上,为了确保现代计算机的复杂硬件更象一个传统的标准PC,BIOS做了大最的初始化工作。

Exercises

练习

1. Due to sector granularity, the call to readseg in the text is equivalent to readseg((uchar*)0x100000, 0xb500, 0x1000). In practice, this sloppy behavior turns

out not to be a problem Why doesn’t the sloppy readsect cause problems?

由于扇区大小的原因,调用readseg等同于readseg((uchar*)0x100000, 0xb500, 0x1000)。实际上,这个草率的行为证明不会是一个问题,为什么草率的readsect会导致问题?

2. something about BIOS lasting longer + security problems

关于BIOS的长度和安全问题。

3. Suppose you wanted bootmain() to load the kernel at 0x200000 instead of 0x100000, and you did so by modifying bootmain() to add 0x100000 to the va of each ELF section. Something would go wrong. What?

假设你想在bootmain()中去将内核载入到0x200000代替0x100000,你可以修改bootmain(),在每一个ELF区域中加上0x100000。会发生什么错误。

4. It seems potentially dangerous for the boot loader to copy the ELF header to memory at the arbitrary location 0x10000. Why doesn’t it call malloc to obtain the memory it needs?

看起来启动器拷贝ELF头到一个武断内存地址0x10000是有潜在风险的。为什么不调用malloc来获得所需要的内存?

上一篇:【Shell 编程基础第二部分】Shell里的流程控制、Shell里的函数及脚本调试方法!


下一篇:【导航】Python常用资源(从新手到大牛)