Chapter 0
第0章
Operating system interfaces
操作系统接口
The job of an operating system is to share a computer among multiple programs and to provide a more useful set of services than the hardware alone supports. The operating system manages and abstracts the low-level hardware, so that, for example, a word processor need not concern itself with which type of disk hardware is being used. It also multiplexes the hardware, allowing many programs to share the computer and run (or appear to run) at the same time. Finally, operating systems provide controlled ways for programs to interact, so that they can share data or work together.
操作系统的工作是在多个程序之间共享计算机资源,并且提供一系列比硬件本身支持的更有用的服务。操作系统管理并抽象底层硬件,比如,字符处理器不需要关心自身正在使用何种底层硬件。操作系统还支持多路复用硬件,以便多个程序共享计算机在同时运行(看起来在同时运行)。最后,操作系统还为程序提供受控的方式来交互,以便它们能够共享数据或者协同工作。
An operating system provides services to user programs through an interface. Designing a good interface turns out to be difficult. On the one hand, we would like the interface to be simple and narrow because that makes it easier to get the implementation right. On the other hand, we may be tempted to offer many sophisticated features to applications. The trick in resolving this tension is to design interfaces that rely on a few mechanisms that can be combined to provide much generality.
操作系统通过接口来为用户程序提供服务。设计一个好的接口很困难。一方面,我们想要接口尽量简单和窄,以易于正确实现。另一方面,我们可能又想为应用添加很多复杂的特性。解决这种矛盾的方法是接口设计依赖尽量少的机制,而通过这些机制的组合来提供更通用的功能。
This book uses a single operating system as a concrete example to illustrate operating system concepts. That operating system, xv6, provides the basic interfaces introduced by Ken Thompson and Dennis Ritchie’s Unix operating system, as well as mimicking Unix’s internal design. Unix provides a narrow interface whose mechanisms combine well, offering a surprising degree of generality. This interface has been so successful that modern operating systems—BSD, Linux, Mac OS X, Solaris, and even, to a lesser extent, Microsoft Windows—have Unix-like interfaces. Understanding xv6 is a good start toward understanding any of these systems and many others.
本书使用单操作系统作为示例来阐述操作系统的概念。xv6操作系统提供由Ken Thompson and Dennis Ritchie的Unix操作系统中引入的基础接口,同时模仿了Unix的内部设计。Unix提供了机制结合良好的窄接口,这些接口提供了令人吃惊的通用性。这些接口的设计非常成功,现代操作系统-BSD, Linux, Mac OS X, Solaris, and even, to a lesser extent, Microsoft Windows-都有类Unix接口。理解xv6是进一步理解这些和其他操作系统的良好的开端。
As shown in Figure 0-1, xv6 takes the traditional form of a kernel, a special program that provides services to running programs. Each running program, called a process, has memory containing instructions, data, and a stack. The instructions implement the program’s computation. The data are the variables on which the computation acts. The stack organizes the program’s procedure calls.
如图0-1所示,xv6采用了传统内核的设计方式,内核即一个给程序运行提供服务的特殊程序。每个运行中的程序称之为进程,进程的内存中含有指令、数据和栈。指令实现了程序的执行。数据是程序执行运算需要的变量。栈组织了程序的执行调用。
When a process needs to invoke a kernel service, it invokes a procedure call in the operating system interface. Such a procedure is called a system call. The system call enters the kernel; the kernel performs the service and returns. Thus a process alternates between executing in user space and kernel space.
当进程需要使用内核服务时,它需要使用操作系统接口中的程序调用。系统调用进入内核;内核执行服务然后返回。因此进程在用户空间和内核空间进行切换。
The kernel uses the CPU’s hardware protection mechanisms to ensure that each process executing in user space can access only its own memory. The kernel executes with the hardware privileges required to implement these protections; user programs execute without those privileges. When a user program invokes a system call, the hardware raises the privilege level and starts executing a pre-arranged function in the kernel.
内核使用CPU硬件保护机制以确保各个进程在用户空间执行时只能访问自身拥有的内存。内核以拥有实现这些保护机制的硬件特权执行;用户程序执行则没有这些特权。当用户程序调用系统调用,硬件提升特权等级,并执行预先准备好的内核功能。
The collection of system calls that a kernel provides is the interface that user programs see. The xv6 kernel provides a subset of the services and system calls that Unix kernels traditionally offer. Figure 0-2 lists all xv6’s system calls.
内核提供的系统调用时用户程序可见的接口。xv6内核提供了Unix内核传统服务和系统调用的一个子集。图0-2列出了所有xv6提供的系统调用。
The rest of this chapter outlines xv6’s services—processes, memory, file descriptors, pipes, and file system—and illustrates them with code snippets and discussions of how the shell uses them. The shell’s use of system calls illustrates how carefully they have been designed.
本章的剩余部分概述了xv6的服务-进程、内存、文件描述符、管道和文件系统,并且通过代码片段和shell如何使用它们来进行阐述。shell使用证明了系统调用是精心设计的。
The shell is an ordinary program that reads commands from the user and executes them, and is the primary user interface to traditional Unix-like systems. The fact that the shell is a user program, not part of the kernel, illustrates the power of the system call interface: there is nothing special about the shell. It also means that the shell is easy to replace; as a result, modern Unix systems have a variety of shells to choose from, each with its own user interface and scripting features. The xv6 shell is a simple implementation of the essence of the Unix Bourne shell. Its implementation can be found at line (8350).
shell是一个普通程序,它从用户端读取命令并且执行命令,这是传统类Unix系统的基本的用户界面。实际上shell是一个用户端程序,它并不是内核的一部分,这证明了系统调用接口的强大:shell也没有什么特别的。这也意味着shell容易被替换;结果是,现代Unix操作系统有很多可以选择的shell,每种都有自己的用户界面和脚本特性。xv6 shell是Unix Bourne shell核心的一个简单实现。在行(8350)可以找到它的实现。
Processes and memory
进程和内存
An xv6 process consists of user-space memory (instructions, data, and stack) and per-process state private to the kernel. Xv6 can time-share processes: it transparently switches the available CPUs among the set of processes waiting to execute. When a process is not executing, xv6 saves its CPU registers, restoring them when it next runs the process. The kernel associates a process identifier, or pid, with each process. A process may create a new process using the fork system call. Fork creates a new process, called the child process, with exactly the same memory contents as the calling process, called the parent process. Fork returns in both the parent and the child. In the parent, fork returns the child’s pid; in the child, it returns zero. For example, consider the following program fragment:
xv6进程包含了用户空间内存(命令、数据、栈)和内核空间私有的每个进程的状态。xv6分时共享进程:它透明的在等待执行的进程集合间切换可用CPU。但是当进程不在执行状态时,xv6保存进程的寄存器,当进程下次运行时恢复寄存器。内核通过进程描述符或者pid与每个进程进行关联。进程可以通过调用fork系统调用创建新进程。fork创建的新进程称为子进程,与调用进程即作父进程拥有完全一样的内存内容。fork同时在父进程和子进程中返回。在父进程中,fork返回子进程的pid;在子进程中,fork返回0。例如,思考下面的代码片段:
int pid = fork(); if(pid > 0){ printf("parent: child=%d\n", pid); pid = wait(); printf("child %d is done\n", pid); } else if(pid == 0){ printf("child: exiting\n"); exit(); } else { printf("fork error\n"); }
The exit system call causes the calling process to stop executing and to release resources such as memory and open files. The wait system call returns the pid of an exited child of the current process; if none of the caller’s children has exited, wait waits for one to do so. In the example, the output lines
exit系统调用会使调用进程停止执行并且释放资源,如内存和打开的文件。wait系统调用返回当前进程的一个退出子进程的pid;如果没有子进程退出,wait调用等待一个子进程退出。在上例中,输出如下:
parent: child=1234 child: exiting
might come out in either order, depending on whether the parent or child gets to its printf call first. After the child exits the parent’s wait returns, causing the parent to print
输出可以是任意顺序的,这取决于父子进程谁先获得printf调用。当子进程退出时,父进程的wait返回,这会使父进程输出:
parent: child 1234 is done
Note that the parent and child were executing with different memory and different registers: changing a variable in one does not affect the other.
注意父子进程分别使用不同的内存和寄存器执行:在其中一个进程中修改变量并不影响另外的进程。
The exec system call replaces the calling process’s memory with a new memory image loaded from a file stored in the file system. The file must have a particular format, which specifies which part of the file holds instructions, which part is data, at which instruction to start, etc. xv6 uses the ELF format, which Chapter 2 discusses in more detail. When exec succeeds, it does not return to the calling program; instead, the instructions loaded from the file start executing at the entry point declared in the ELF header. Exec takes two arguments: the name of the file containing the executable and an array of string arguments. For example:
exec系统调用会使用文件系统中文件存储的内存映像来替换当前调用进程的内存。文件必须是指定格式的,格式需要指明哪部分持是指令,哪部分是数据,从哪条指令开始执行等等。xv6使用ELF格式,在第2章中会详细讨论。当exec执行成功,它不会返回到调用进程;而是载入可执行文件,并且在ELF文件头声明的指令入口点处开始执行。exec接收两个参数:可执行文件的名称和字符串类型的参数数组。例如:
char *argv[3]; argv[0] = "echo"; argv[1] = "hello"; argv[2] = 0; exec("/bin/echo", argv); printf("exec error\n");
This fragment replaces the calling program with an instance of the program /bin/echo running with the argument list echo hello. Most programs ignore the first
argument, which is conventionally the name of the program.
此代码片段使用/bin/echo的实例替换调用进程,并且带有echo hello参数列表。大多数程序会忽略第一个参数,这个参数传统上表示程序名称。
The xv6 shell uses the above calls to run programs on behalf of users. The main structure of the shell is simple; see main (8501). The main loop reads the input on the command line using getcmd. Then it calls fork, which creates a copy of the shell process. The parent shell calls wait, while the child process runs the command. For example, if the user had typed ‘‘echo hello’’ at the prompt, runcmd would have been called with ‘‘echo hello’’ as the argument. runcmd (8406) runs the actual command. For ‘‘echo hello’’, it would call exec (8426). If exec succeeds then the child will execute instructions from echo instead of runcmd. At some point echo will call exit, which will cause the parent to return from wait in main (8501). You might wonder why fork and exec are not combined in a single call; we will see later that separate calls for creating a process and loading a program is a clever design.
xv6 shell使用上述调用运行用户端程序。shell的主结构很简单:见main函数(8501)。main循环使用getcmd读取命令行输入。然后调用fork来创建一个shell进程的拷贝。父shell进程调用wait,子进程执行命令。例如:如果用户在提示符下键入"echo hello",runcmd将会被调用,参数是"echo hello"。runcmd(8406)执行实际的命令。 对于"echo hello"来说,它将调用exec(8426)。如果exec调用成功,子进程将替代runcmd执行echo的指令。在某时间点echo调用exit退出,这会使父进程从wait返回(8501)。你可能会问为什么fork和exec不组成一个独立调用;后续我们会看到将创建进程和加载程序的调用分开是聪明的设计。
Xv6 allocates most user-space memory implicitly: fork allocates the memory re[1]quired for the child’s copy of the parent’s memory, and exec allocates enough memory to hold the executable file. A process that needs more memory at run-time (perhaps for malloc) can call sbrk(n) to grow its data memory by n bytes; sbrk returns the location of the new emory.
xv6隐式的分配大部分用户空间内存:fork为子进程分配内存用于拷贝父进程的内存,exec分配足够的内存来加载可执行文件。当进程(也许是对于malloc)在运行时需要更多内存时,可以调用sbrk(n)来为内存增加n字节;sbrk返回新内存的地址。
Xv6 does not provide a notion of users or of protecting one user from another; in Unix terms, all xv6 processes run as root.
xv6没有提供用户的概念,也没有对不同用户进行保护;用Unix术语,所有xv6进程都以root用户运行。
I/O and File descriptors
I/O和文件描述符
A file descriptor is a small integer representing a kernel-managed object that a process may read from or write to. A process may obtain a file descriptor by opening a file, directory, or device, or by creating a pipe, or by duplicating an existing descriptor. For simplicity we’ll often refer to the object a file descriptor refers to as a ‘‘file’’; the file descriptor interface abstracts away the differences between files, pipes, and devices, making them all look like streams of bytes.
文件描述符是一个小整数,它代表一个内核管理对象,进程可以通过描述符进行读写操作。进程可以通过打开文件、目录、设备获取文件描述符,或者创建管道,亦或者复制一个已经存在的文件描述符。简单来讲,我们称文件描述符指向的对象为“文件”;文件描述符接口抽象了文件、管道和设备,使它们看起来像字节流一样。
Internally, the xv6 kernel uses the file descriptor as an index into a per-process table, so that every process has a private space of file descriptors starting at zero. By convention, a process reads from file descriptor 0 (standard input), writes output to file descriptor 1 (standard output), and writes error messages to file descriptor 2 (standard error). As we will see, the shell exploits the convention to implement I/O redirection and pipelines. The shell ensures that it always has three file descriptors open (8507), which are by default file descriptors for the console.
在内部,xv6内核使用文件描述符作为进程表的索引,也就是说每个进程都有一个从0开始的文件描述符的私有空间。按照惯例,进程从文件描述符0读取(标准输入),向文件描述符1写(标准输出),向文件描述符2写错误信息(标准错误)。正如我们将要看到的,shell使用约定实现IO重定向和管道。shell确保总是有三个文件描述符被打开(8507),即终端默认的文件描述符。
The read and write system calls read bytes from and write bytes to open files named by file descriptors. The call read(fd, buf, n) reads at most n bytes from the file descriptor fd, copies them into buf, and returns the number of bytes read. Each file descriptor that refers to a file has an offset associated with it. Read reads data from the current file offset and then advances that offset by the number of bytes read: a subsequent read will return the bytes following the ones returned by the first read. When there are no more bytes to read, read returns zero to signal the end of the file.
read和write系统调用操作以文件描述符命名的打开文件,从其读取字节和向其写入字节。read(fd, buf, n)调用从文件描述符fd中读取至多n个字节,并拷贝到buf,然后返回读取的字节数。每个指向文件的描述符都有一个关联的偏移。read从当前文件偏移处读取数据,然后向偏移增加读取的字节数:随后的读取会返回第一次读取字节后面紧跟随的字节。当没有更多数据可读时,read返回0表示读到文件末尾。
The call write(fd, buf, n) writes n bytes from buf to the file descriptor fd and returns the number of bytes written. Fewer than n bytes are written only when an error occurs. Like read, write writes data at the current file offset and then advances that offset by the number of bytes written: each write picks up where the previous one left off.
write(fd, buf, n)调用将buf中的n个字节写入到文件描述符fd,并返回写入的字节数。只有出错时才会使写入字节数小于n字节。与read相同,write在当前偏移处写入数据,然后向偏移增加写入的字节数:每次写入都从上一次写结束的位置开始。
The following program fragment (which forms the essence of cat) copies data from its standard input to its standard output. If an error occurs, it writes a message to the standard error.
下面的代码片段(构成cat的本质)将从标准输入拷贝数据到标准输出。如果出错,会向标准错误输出信息。
char buf[512]; int n; for(;;){ n = read(0, buf, sizeof buf); if(n == 0) break; if(n < 0){ fprintf(2, "read error\n"); exit(); } if(write(1, buf, n) != n){ fprintf(2, "write error\n"); exit(); } }
The important thing to note in the code fragment is that cat doesn’t know whether it is reading from a file, console, or a pipe. Similarly cat doesn’t know whether it is printing to a console, a file, or whatever. The use of file descriptors and the convention that file descriptor 0 is input and file descriptor 1 is output allows a simple implementation of cat.
代码片段中需要重点注意的是cat不知道它正从文件、终端或者管道读取内容。同样它也不知道正输出内容到终端、文件或者其它地方。文件描述符和描述符0作为输入1作为输出的约定使我们可以简单的实现cat。
The close system call releases a file descriptor, making it free for reuse by a future open, pipe, or dup system call (see below). A newly allocated file descriptor is always the lowest-numbered unused descriptor of the current process.
close系统调用释放文件描述符,使它可以在将来被open、pipe、dup系统调用复用(见下面描述)。新分配的文件描述符始终是当前进程中最小的未使用的描述符。
File descriptors and fork interact to make I/O redirection easy to implement. Fork copies the parent’s file descriptor table along with its memory, so that the child starts with exactly the same open files as the parent. The system call exec replaces the calling process’s memory but preserves its file table. This behavior allows the shell to implement I/O redirection by forking, reopening chosen file descriptors, and then execing the new program. Here is a simplified version of the code a shell runs for the command cat <input.txt:
文件描述符和fork交互使得IO重定向易于实现。fork复制父进程文件描述符表到自身内存,以便子进程以父进程相同的打开文件来开始执行。系统调用exec替换了调用进程的内存,但是保留了文件表。这种方式允许shell通过fork,重新打开文件描述符,然后执行新程序来实现IO重定向。下面代码是shell运行命令cat < input.txt的简化版本:
char *argv[2]; argv[0] = "cat"; argv[1] = 0; if(fork() == 0) { close(0); open("input.txt", O_RDONLY); exec("cat", argv); }
After the child closes file descriptor 0, open is guaranteed to use that file descriptor for the newly opened input.txt: 0 will be the smallest available file descriptor. Cat then executes with file descriptor 0 (standard input) referring to input.txt.
当子进程关闭了文件描述符0之后,确保了open在新打开input.txt时使用该文件描述符:0将是最小的可用文件描述符。然后cat使用指向input.txt的文件描述符0执行。
The code for I/O redirection in the xv6 shell works in exactly this way (8430). Recall that at this point in the code the shell has already forked the child shell and that runcmd will call exec to load the new program. Now it should be clear why it is a good idea that fork and exec are separate calls. This separation allows the shell to fix up the child process before the child runs the intended program.
xv6 shell中的IO重定向代码就是以这种方式实现的(8430)。回想一下,此时代码中的shell已经fork了子shell,并且runcmd将会调用exec加载新的程序。现在应该清楚为什么fork和exec分开调用时一个好主意了。这种分离允许shell在执行目标程序之前对子进程进行处理。
Although fork copies the file descriptor table, each underlying file offset is shared between parent and child. Consider this example:
尽管fork拷贝文件描述符表,但是父子进程的底层文件描述符偏移是共享的。思考这个例子:
if(fork() == 0) { write(1, "hello ", 6); exit(); } else { wait(); write(1, "world\n", 6); }
At the end of this fragment, the file attached to file descriptor 1 will contain the data hello world. The write in the parent (which, thanks to wait, runs only after the child is done) picks up where the child’s write left off. This behavior helps produce sequential output from sequences of shell commands, like (echo hello; echo world) > output.txt.
在代码片段的结尾,关联文件描述符1的文件将包含数据hello world。父进程(由于wait,只有当子进程完成之后才开始运行)的write函数在子进程write结束的位置继续写入。这种方式有助于从shell命令序列中实现顺序输出,如(echo hello; echo world) > output.txt。
The dup system call duplicates an existing file descriptor, returning a new one that refers to the same underlying I/O object. Both file descriptors share an offset, just as the file descriptors duplicated by fork do. This is another way to write hello world into a file:
dup系统调用复制已经存在的文件描述符,返回一个指向相同底层IO对象的新的描述符。两个描述符共享偏移,和fork复制文件描述符的方式类似。下面是写hello world到文件的另外一种方式:
fd = dup(1); write(1, "hello ", 6); write(fd, "world\n", 6);
Two file descriptors share an offset if they were derived from the same original file descriptor by a sequence of fork and dup calls. Otherwise file descriptors do not share offsets, even if they resulted from open calls for the same file. Dup allows shells to implement commands like this: ls existing-file non-existing-file > tmp1 2>&1. The 2>&1 tells the shell to give the command a file descriptor 2 that is a duplicate of descriptor 1. Both the name of the existing file and the error message for the non-existing file will show up in the file tmp1. The xv6 shell doesn’t support I/O redirection for the error file descriptor, but now you know how to implement it.
如果通过fork和dup系统调用得来的两个文件描述符来自同一个原始文件描述符,那么这两个文件描述符共享偏移。否则文件描述符不共享偏移,即使它们是通过open调用打开的同一个文件。dup允许shell像这样实现命令:ls existing-file non-existing-file > tmp1 2>&1。2>&1告诉shell提供一个复制自文件描述符1的文件描述符2。存在文件的名字和不存在文件的错误信息都将显示在tmp1中。xv6 shell不支持错误文件描述符的重定向,但是现在你应该知道怎么实现它了。
Pipes
管道
A pipe is a small kernel buffer exposed to processes as a pair of file descriptors, one for reading and one for writing. Writing data to one end of the pipe makes that data available for reading from the other end of the pipe. Pipes provide a way for processes to communicate.
管道是一个小的内核缓冲区,它暴露给进程的是一对文件描述符,一个用于读一个用于写。从一端写数据会使另外一端的数据可读。管道提供了一种进程间的通信方式。
The following example code runs the program wc with standard input connected to the read end of a pipe.
下面示例代码使用标准输入关联管道的读端来运行wc程序:
int p[2]; char *argv[2]; argv[0] = "wc"; argv[1] = 0; pipe(p); if(fork() == 0) { close(0); dup(p[0]); close(p[0]); close(p[1]); exec("/bin/wc", argv); } else { write(p[1], "hello world\n", 12); close(p[0]); close(p[1]); }
The program calls pipe, which creates a new pipe and records the read and write file descriptors in the array p. After fork, both parent and child have file descriptors referring to the pipe. The child dups the read end onto file descriptor 0, closes the file descriptors in p, and execs wc. When wc reads from its standard input, it reads from the pipe. The parent writes to the write end of the pipe and then closes both of its file descriptors.
程序调用pip创建了新的管道,并且在数组p中记录了读和写文件描述符。在fork之后,父子进程都持有指向通道的文件描述符。子系统复制读端到文件描述符0,关闭p中的文件描述符,然后执行wc;当wc从标准输入读取时,实际上是从管道读取的。父进程想管道的写端写,然后关闭它的管道的两个描述符。
If no data is available, a read on a pipe waits for either data to be written or all file descriptors referring to the write end to be closed; in the latter case, read will return 0, just as if the end of a data file had been reached. The fact that read blocks until it is impossible for new data to arrive is one reason that it’s important for the child to close the write end of the pipe before executing wc above: if one of wc’s file descriptors referred to the write end of the pipe, wc would never see end-of-file.
如果没有数据可读,读端会等待数据写入或者等待指向管道写端的所有文件描述符都关闭;后一种情况,读会返回0,就如同数据文件到达文件尾一样。读取会阻塞到没有新数据到达的原因是,子进程在执行wc之前关闭管道的写端是很重要的:如果某个wc文件描述符指向的通道写端未关闭,则wc永远不读到文件尾。
The xv6 shell implements pipelines such as grep fork sh.c | wc -l in a manner similar to the above code (8450). The child process creates a pipe to connect the left end of the pipeline with the right end. Then it calls runcmd for the left end of the pipeline and runcmd for the right end, and waits for the left and the right ends to finish, by calling wait twice. The right end of the pipeline may be a command that itself includes a pipe (e.g., a | b | c), which itself forks two new child processes (one for b and one for c). Thus, the shell may create a tree of processes. The leaves of this tree are commands and the interior nodes are processes that wait until the left and right children complete. In principle, you could have the interior nodes run the left end of a pipeline, but doing so correctly would complicate the implementation.
xv6实现管道的方式和grep fork sh.c | wc -l类似(8450)。子进程创建管道连接左端和右端。然后在管道左端调用runcmd,在管道右端调用runcmd,然后通过调用两次wait等待左端和右端结束。右端的管道可能是个内部包含管道的命令(如,a | b | c),它自身又fork两个子进程(一个为b,一个为c)。因此,shell可能创建进程树。树的叶子节点是命令,内部节点是进程,进程保持等待直到左右孩子节点完成。原则上,可以使用内部节点运行管道的左侧端,但是这样做会使实现变得复杂。
Pipes may seem no more powerful than temporary files: the pipeline
管道看起来实现没有比临时文件更强大:管道:
echo hello world | wc
could be implemented without pipes as
不使用管道实现:
echo hello world >/tmp/xyz; wc </tmp/xyz
There are at least three key differences between pipes and temporary files. First, pipes automatically clean themselves up; with the file redirection, a shell would have to be careful to remove /tmp/xyz when done. Second, pipes can pass arbitrarily long streams of data, while file redirection requires enough free space on disk to store all the data. Third, pipes allow for synchronization: two processes can use a pair of pipes to send messages back and forth to each other, with each read blocking its calling process until the other process has sent data with write.
管道和临时文件实现至少有三处关键的不同点。第一,管道能够自动做自清理;使用文件重定向,shell不得不在执行完毕之后小心地删除/tmp/xyn。第二,管道可以传递任意长的数据流,但是文件重定向则需要磁盘有足够的空间存储所有数据。第三,管道允许同步:两个进程可以使用一对管道来回发送消息,每个read都会阻塞调用进程直到另外的进程调用write发送数据为止。
File system
文件系统
The xv6 file system provides data files, which are uninterpreted byte arrays, and directories, which contain named references to data files and other directories. Xv6 implements directories as a special kind of file. The directories form a tree, starting at a special directory called the root. A path like /a/b/c refers to the file or directory named c inside the directory named b inside the directory named a in the root directory /. Paths that don’t begin with / are evaluated relative to the calling process’s current directory, which can be changed with the chdir system call. Both these code fragments open the same file (assuming all the directories involved exist):
xv6文件系统提供数据文件和目录,数据文件是未解释的字符数组,目录包含数据文件和其它目录的引用。xv6把目录当作一种特殊文件实现。目录组成树,在称为根节点的特殊目录开始。如路径/a/b/c指向文件或者目录c,c在b目录下,b在a目录下,a在名为根目录的/下。不以/开头的路径被认为是相对于调用进程的当前目录的路径,这个路径可以通过chdir系统调用更改。下面两段代码打开相同文件(假设所有涉及的目录均存在):
chdir("/a"); chdir("b"); open("c", O_RDONLY); open("/a/b/c", O_RDONLY);
The first fragment changes the process’s current directory to /a/b; the second neither refers to nor modifies the process’s current directory.
第一段代码更改进程当前目录到/a/b;第二段代码不指向也不更改进程的当前目录。
There are multiple system calls to create a new file or directory: mkdir creates a new directory, open with the O_CREATE flag creates a new data file, and mknod creates a new device file. This example illustrates all three:
多种系统调用用来创建文件或者目录:mkdir创建新的目录,open带有O_CREATE标识创建新的数据文件,mknod创建新的设备文件。下面的例子阐述了这三种使用方式:
mkdir("/dir"); fd = open("/dir/file", O_CREATE|O_WRONLY); close(fd); mknod("/console", 1, 1);
Mknod creates a file in the file system, but the file has no contents. Instead, the file’s metadata marks it as a device file and records the major and minor device numbers (the two arguments to mknod), which uniquely identify a kernel device. When a process later opens the file, the kernel diverts read and write system calls to the kernel device implementation instead of passing them to the file system.
mknod在文件系统中创建文件,但是文件没有内容。文件的原数据标识它为设备文件,并且记录了主次设备号(mknod的两个参数),设备号唯一标识了内核中的一个设备。后续进程打开文件时,内核将读取和写入系统调用转到内核设备实现,而不是将其传递到文件系统。
fstat retrieves information about the object a file descriptor refers to. It fills in a struct stat, defined in stat.h as:
fstat可以获得文件描述符指向对象的信息。信息填充到struct stat结构,该结构在stat.h中定义:
#define T_DIR 1 // Directory #define T_FILE 2 // File #define T_DEV 3 // Device struct stat { short type; // Type of file int dev; // File system’s disk device uint ino; // Inode number short nlink; // Number of links to file uint size; // Size of file in bytes };
A file’s name is distinct from the file itself; the same underlying file, called an inode, can have multiple names, called links. The link system call creates another file system name referring to the same inode as an existing file. This fragment creates a new file named both a and b.
文件名和文件本身是不同的;相同的底层文件(称为inode),可以有多个名字(称为链接)。link系统调用为存在的文件创建指向相同的inode的另外一个文件系统名。下面代码片段创建了一个名为a和b的新文件。
open("a", O_CREATE|O_WRONLY); link("a", "b");
Reading from or writing to a is the same as reading from or writing to b. Each inode is identified by a unique inode number. After the code sequence above, it is possible to determine that a and b refer to the same underlying contents by inspecting the result of fstat: both will return the same inode number (ino), and the nlink count will be set to 2.
从a读和向a写与从b读和向b写是相同的。每个inode都被唯一的inode号标识。上述代码序列之后,可以通过fstat的结果判定a和b指向相同的底层内容:两者都会返回相同的inode号(ino),并且nlink数被设置为2。
The unlink system call removes a name from the file system. The file’s inode and the disk space holding its content are only freed when the file’s link count is zero and no file descriptors refer to it. Thus adding
unlink系统调用从文件系统中删除名字。文件的inode和磁盘空间保存的内容只有在链接数为0并且没有文件描述符指向它时释放。因此添加:
unlink("a");
to the last code sequence leaves the inode and file content accessible as b.
到最后代码序列,inode和文件内容只能被b访问。
Furthermore,
此外,
fd = open("/tmp/xyz", O_CREATE|O_RDWR); unlink("/tmp/xyz");
is an idiomatic way to create a temporary inode that will be cleaned up when the process closes fd or exits. Xv6 commands for file system operations are implemented as userlevel programs such as mkdir, ln, rm, etc. This design allows anyone to extend the shell with new user commands. In hind-sight this plan seems obvious, but other systems designed at the time of Unix often built such commands into the shell (and built the shell into the kernel).
是创建临时inode的惯用方式,当进程关闭fd或者退出时,inode将会被清除。xv6文件系统操作命令实现为用户空间程序,如mkdir、ln、rm等。这种设计允许任何人通过新的用户命令扩展shell。事后看来,这个设计很显然,但是在Unix时代的其他系统的设计则把命令内置到shell中(并且将shell内置到内核中)。
One exception is cd, which is built into the shell (8516). cd must change the current working directory of the shell itself. If cd were run as a regular command, then the shell would fork a child process, the child process would run cd, and cd would change the child’s working directory. The parent’s (i.e., the shell’s) working directory would not change.
cd是个特例,它内置在shell中(8516)。cd必须更改当前shell自身的工作目录。如果cd作为普通命令运行,则shell需要fork子进程,子进程将会运行cd,然后cd将更改子进程的工作目录。但是父进程(即shell)的工作目录并没有更改。
Real world
现实情况
Unix’s combination of the ‘‘standard’’ file descriptors, pipes, and convenient shell syntax for operations on them was a major advance in writing general-purpose reusable programs. The idea sparked a whole culture of ‘‘software tools’’ that was responsible for much of Unix’s power and popularity, and the shell was the first so-called ‘‘scripting language.’’ The Unix system call interface persists today in systems like BSD, Linux, and Mac OS X.
Unix对标准文件描述符、管道和方便操作它们的shell语法的整合是编写可重用程序的一项重要进步。这个想法激发了整个“软件工具”文化,这是Unix强大和流行的主要原因,并且shell第一个被称作为“脚本语言”。Unix系统调用接口在BSD、Linux和Mac OS X等操作系统上一直存在。
Modern kernels provide many more system calls, and many more kinds of kernel services, than xv6. For the most part, modern Unix-derived operating systems have not followed the early Unix model of exposing devices as special files, like the console device file discussed above. The authors of Unix went on to build Plan 9, which applied the ‘‘resources are files’’ concept to modern facilities, representing networks, graphics, and other resources as files or file trees.
相对于xv6,现代内核提供了更多的系统调用和更多种类的服务。很大程度上, Unix衍生的现代操作系统没有遵循早期的Unix模型,即将设备作为特殊文件,如上面讨论的控制台设备。Unix的作者继续构建计划9,计划将“资源即文件”的概念应用到现代的设备上,将网络、图形和其他资源表示为文件或者文件树。
The file system abstraction has been a powerful idea, most recently applied to network resources in the form of the World Wide Web. Even so, there are other models for operating system interfaces. Multics, a predecessor of Unix, abstracted file storage in a way that made it look like memory, producing a very different flavor of interface. The complexity of the Multics design had a direct influence on the designers of Unix, who tried to build something simpler.
文件系统抽象是个强大的想法,最近以万维网的形式应用到网络资源上。即便这样,也存在其他操作系统接口模型。Multics,Unix的前身,它以看似内存的方式抽象文件存储,产生了截然不同的界面风格。Multics设计的复杂性直接影响了Unix设计者,他门试图构建更简单的设计。
This book examines how xv6 implements its Unix-like interface, but the ideas and concepts apply to more than just Unix. Any operating system must multiplex processes onto the underlying hardware, isolate processes from each other, and provide mechanisms for controlled inter-process communication. After studying xv6, you should be able to look at other, more complex operating systems and see the concepts underlying xv6 in those systems as well.
本教材探讨了xv6实现类Unix接口的方式,但是这种思想和概念不仅仅适用于Unix。任何操作系统都需要将多个进程复用在底层硬件上,对进程进行隔离,提供进程间通信的机制。学习了xv6之后,你可以查看其他更加复杂的操作系统,并且查看他们系统中包含的xv6的概念。