Use Reentrant Functions for Safer Signal Handling
使用可重入函数进行更安全的信号处理
How and when to employ reentrancy to keep your code bug free
何时及如何利用可重入性避免代码缺陷
Dipak Jha (mailto:dipakjha@in.ibm.com?subject=Use reentrant functions for safer signal handling&cc=dipakjha@yahoo.com), Software Engineer, IBM
Date: 20 Jan 2005
Summary: If you deal with concurrent access of functions, either by threads or processes, you can face problems caused by non-reentrancy of the functions. In this article, learn through code samples how anomalies can result if reentrancy is not ensured, especially with regard to signals. Five recommended programming practices are included, along with a discussion of a proposed compiler model in which the compiler front end deals with reentrancy. 若对函数进行并发访问(无论通过线程或进程),可能会遇到函数不可重入所导致的问题。在本文中,通过代码示例可了解若可重入性不能保证时如何导致异常,尤其是有关信号(signals)方面。本文包含五条推荐的编程实践,并提出和讨论一个编译器模型,该模型中可重入性由编译器前端处理。
In the early days of programming, non-reentrancy was not a threat to programmers; functions did not have concurrent access and there were no interrupts. In many older implementations of the C language, functions were expected to work in an environment of single-threaded processes. 在早期编程中,不可重入性对程序员并未构成威胁;函数不会有并发访问,也没有中断存在。在很多较老的C 语言实现中,函数被认为是在单线程进程的环境中运行。
Now, however, concurrent programming is common practice, and you need to be aware of the pitfalls. This article describes some potential problems due to non-reentrancy of the function in parallel and concurrent programming. Signal generation and handling in particular add extra complexity. Due to the asynchronous nature of signals, it is difficult to point out the bug caused when a signal-handling function triggers a non-reentrant function. 然而,如今并发编程已普遍使用,您需要意识到(可重入性)这一陷阱。本文将描述在并行和并发编程中函数不可重入性导致的一些潜在问题。信号的生成和处理尤其增加了额外的复杂性。由于信号在本质上是异步的,因此难以找出当信号处理函数触发某个不可重入函数时导致的缺陷。
This article:
- Defines reentrancy and includes a POSIX listing of a reentrant function 定义可重入性,并包含一个可重入函数的POSIX清单
- Provides examples to show problems caused by non-reentrancy 给出示例以说明不可重入性所导致的问题
- Suggests ways to ensure reentrancy of the underlying function 指出确保底层函数的可重入性的方法
- Discusses dealing with reentrancy at the compiler level 讨论在编译器层面上处理可重入性
What is reentrancy?
A reentrant function is one that can be used by more than one task concurrently without fear of data corruption. Conversely, a non-reentrant function is one that cannot be shared by more than one task unless mutual exclusion to the function is ensured either by using a semaphore or by disabling interrupts during critical sections of code. A reentrant function can be interrupted at any time and resumed at a later time without loss of data. Reentrant functions either use local variables or protect their data when global variables are used. 可重入函数可以由多于一个任务并发使用,而不必担心数据错误。相反,不可重入函数不能由超过一个任务所共享,除非通过使用信号量或者在代码关键部分禁用中断以确保函数的互斥。可重入函数可在任意时刻被中断,稍后再继续恢复运行,而不会丢失数据。可重入函数要么使用本地变量,要么在使用全局变量时保护自己的数据。
A reentrant function:
- Does not hold static data over successive calls 不为连续的调用保持静态数据
- Does not return a pointer to static data; all data is provided by the caller of the function 不返回指向静态数据的指针;所有数据都由函数的调用者提供
- Uses local data or ensures protection of global data by making a local copy of it 使用本地数据,或制作全局数据的本地拷贝来保护全局数据
- Must not call any non-reentrant functions 绝不调用任何不可重入函数
Don't confuse reentrance with thread-safety. From the programmer perspective, these two are separate concepts: a function can be reentrant, thread-safe, both, or neither. Non-reentrant functions cannot be used by multiple threads. Moreover, it may be impossible to make a non-reentrant function thread-safe. 不要混淆可重入与线程安全。在程序员看来,这是两个独立的概念:函数可以是可重入的,线程安全的,二者皆是或二者皆非。不可重入的函数不能由多个线程使用。此外,也许不可能让某个不可重入的函数是线程安全的。
IEEE Std 1003.1 lists 118 reentrant UNIX® functions, which aren't duplicated here. See Resources for a link to the list at unix.org. IEEE Std 1003.1列出了118个可重入的 UNIX®函数,在此不予赘述。参见参考资料中指向unix.org上该列表的链接。
The rest of the functions are non-reentrant because of any of the following: 其余函数出于以下任意原因而不可重入:
- They call malloc or free 调用malloc或free(之类的函数)
- They are known to use static data structures 已知使用静态数据结构
- They are part of the standard I/O library 标准I/O库的一部分(该库很多实现使用全局数据结构)
Signals and non-reentrant functions
A signal is a software interrupt. It empowers a programmer to handle an asynchronous event. To send a signal to a process, the kernel sets a bit in the signal field of the process table entry, corresponding to the type of signal received. The ANSI C prototype of a signal function is: 信号是软件中断,它使得程序员可以处理异步事件。为了向进程发送一个信号,内核在进程表项的信号域中设置一个比特位,对应于接收信号的类型。信号函数的ANSI C原型是:
void (*signal (int sigNum, void (*sigHandler)(int))) (int); |
Or, in another representation: 或另一种描述形式:
typedef void sigHandler(int); SigHandler *signal(int, sigHandler *); |
When a signal that is being caught is handled by a process, the normal sequence of instructions being executed by the process is temporarily interrupted by the signal handler. The process then continues executing, but the instructions in the signal handler are now executed. If the signal handler returns, the process continues executing the normal sequence of instructions it was executing when the signal was caught. 当进程处理所捕获的信号时,正在执行的正常指令序列被信号处理器临时中断。然后进程继续执行,但现在执行的是信号处理器中的指令。若信号处理器返回,则进程继续执行信号被捕获时正在执行的正常指令序列。
Now, in the signal handler you can't tell what the process was executing when the signal was caught. What if the process was in the middle of allocating additional memory on its heap using malloc, and you call malloc from the signal handler? Or, you call some function that was in the middle of the manipulation of the global data structure and you call the same function from the signal handler. In the case of malloc, havoc can result for the process, because malloc usually maintains a linked list of all its allocated area and it may have been in the middle of changing this list. 此时,在信号处理器中您并不知道信号被捕获时进程正在执行什么内容。若进程正在使用malloc在其堆(heap)上分配额外内存,您通过信号处理器调用malloc,那会怎样?或者,调用正在操作全局数据结构的某个函数,而在信号处理器中又调用同一个函数。若是调用malloc,则进程会被严重破坏,因为malloc通常会为所有它所分配的所有内存区域维持一个链表,而它可能正在修改该链表。
An interrupt can even be delivered between the beginning and end of a C operator that requires multiple instructions. At the programmer level, the instruction may appear atomic (that is, cannot be divided into smaller operations), but it might actually take more than one processor instruction to complete the operation. For example, take this piece of C code: 甚至可在需要多个指令的C操作符开始和结束之间发送中断。在程序员看来,指令似乎是原子的(即不能被分割为更小的操作),但它实际上可能需要不止一个处理器指令才能完成该操作。以这段C代码为例:
temp += 1; |
On an x86 processor, that statement might compile to: 在x86处理器上,该语句可能被编译为:
mov ax,[temp] inc ax mov [temp],ax |
This is clearly not an atomic operation. 这显然不是一个原子操作。
This example shows what can happen if a signal handler runs in the middle of modifying a variable: 该例(清单1)展示了在修改某个变量的过程中运行信号处理器可能会发生什么事情:
#include <signal.h>
#include <stdio.h> struct two_int{ int a, b; }data; void signal_handler(int signum){
printf ("%d, %d\n", data.a, data.b);
alarm ();
} int main (void){
static struct two_int zeros = { , }, ones = { , }; signal(SIGALRM, signal_handler); data = zeros; alarm (); while ()
{data = zeros; data = ones;}
}
Listing 1. Running a signal handler while modifying a variable
This program fills data with zeros, ones, zeros, ones, and so on, alternating forever. Meanwhile, once per second, the alarm signal handler prints the current contents. (Calling printf in the handler is safe in this program, because it is certainly not being called outside the handler when the signal happens.) What output do you expect from this program? It should print either 0, 0 or 1, 1. But the actual output is as follows: 该程序向data填充0,1,0,1,一直交替进行。同时,alarm信号处理器每秒打印一次当前内容(该程序在处理器中调用printf是安全的,因为当信号发生时它在处理器外部确实没有正被调用)。您预期该程序会输出什么?它应该打印0,0或1,1。但实际输出如下:
0, 0 1, 1 (Skipping some output...) 0, 1 1, 1 1, 0 1, 0 ... |
On most machines, it takes several instructions to store a new value in data, and the value is stored one word at a time. If the signal is delivered between these instructions, the handler might find that data.a is 0 and data.b is 1, or vice versa. On the other hand, if we compile and run this code on a machine where it is possible to store an object's value in one instruction that cannot be interrupted, then the handler will always print 0, 0 or 1, 1. 在大部分机器上,data中存储一个新值需要若干指令,每次存储一个字。若在这些指令期间发出信号,则处理器可能发现data.a为0而 data.b为1,或者反之。另一方面,若我们编译和运行代码的机器能在一个不可中断的指令内存储一个对象值,那么处理器将总是打印0,0 或 1,1。
Another complication with signals is that, just by running test cases you can't be sure that your code is signal-bug free. This complication is due to the asynchronous nature of signal generation. 信号带来的另一问题是,仅凭运行测试用例无法确保代码没有信号缺陷。该问题原因在于信号生成的异步本质。
Non-reentrant functions and static variables
Suppose that the signal handler uses gethostbyname, which is non-reentrant. This function returns its value in a static object: 假定信号处理器使用不可重入的gethostbyname。该函数将值返回到一个静态对象中:
static struct hostent host; /* result stored here*/ |
And it reuses the same object each time. In the following example, if the signal happens to arrive during a call to gethostbyname in main, or even after a call while the program is still using the value, it will clobber the value that the program asked for. 它每次都重新使用同一个对象。在下面的例子中,若信号刚好在main中调用gethostbyname期间到达,或甚至在调用之后到达,而程序仍然在使用那个(对象)值,则信号将破坏程序请求的值。
main(){
struct hostent *hostPtr;
//...
signal(SIGALRM, sig_handler);
//...
hostPtr = gethostbyname(hostNameOne);
//...
} void sig_handler(){
struct hostent *hostPtr;
//...
/* call to gethostbyname may clobber the value stored during the call
inside the main() */
hostPtr = gethostbyname(hostNameTwo);
//...
}
Listing 2. Risky use of gethostbyname
However, if the program does not use gethostbyname or any other function that returns information in the same object, or if it always blocks signals around each use, you're safe. 不过,若程序不使用 gethostbyname或任何其他在同一对象中返回信息的函数,或者每次使用时它都会阻塞信号,那么就是安全的。
Many library functions return values in a fixed object, always reusing the same object, and they can all cause the same problem. If a function uses and modifies an object that you supply, it is potentially non-reentrant; two calls can interfere if they use the same object. 很多库函数在固定的对象中返回值,总是反复使用同一对象,它们都会导致相同的问题。若某个函数使用并修改您提供的某个对象,那它可能就是不可重入的;若两个调用使用同一对象,那么它们会相互干扰。
A similar case arises when you do I/O using streams. Suppose the signal handler prints a message with fprintf and the program was in the middle of an fprintf call using the same stream when the signal was delivered. Both the signal handler's message and the program's data could be corrupted, because both calls operate on the same data structure: the stream itself. 当使用流(stream)进行I/O操作时会出现类似情况。假定信号处理器使用fprintf打印一条消息,而当信号发出时程序正在使用同一个流进行fprintf调用。信号处理器的消息和程序的数据都会被破坏,因为两个调用操作同一数据结构:流本身。
Things become even more complicated when you're using a third-party library, because you never know which parts of the library are reentrant and which are not. As with the standard library, there can be many library functions that return values in fixed objects, always reusing the same objects, which causes the functions to be non-reentrant. 当使用第三方程序库时,事情会变得更为复杂,因为您永远不知道哪部分程序库是可重入的,哪部分是不可重入的。对标准程序库而言,很多库函数在固定的对象中返回值,总是重复使用同一对象,这就使得那些函数不可重入。
The good news is, these days many vendors have taken the initiative to provide reentrant versions of the standard C library. You'll need to go through the documentation provided with any given library to know if there is any change in the prototypes and therefore in the usage of the standard library functions. 好消息是,近来很多提供商已经开始提供标准C程序库的可重入版本。对于任何给定程序库,您需要通读它所提供的文档,以了解其原型和标准库函数的用法是否有所变化。
Practices to ensure reentrancy
Sticking to these five best practices will help you maintain reentrancy in your programs. 遵守这五条最佳实践将帮助您保持程序的可重入性。
Practice 1
Returning a pointer to static data may cause a function to be non-reentrant. For example, a strToUpper function, converting a string to uppercase, could be implemented as follows: 返回指向静态数据的指针可能导致函数不可重入。例如,将字符串转换为大写的strToUpper函数实现可能如下:
char *strToUpper(char *str)
{
/*Returning pointer to static data makes it non-reentrant */
static char buffer[STRING_SIZE_LIMIT];
int index; for (index = ; str[index]; index++)
buffer[index] = toupper(str[index]);
buffer[index] = '\0';
return buffer;
}
Listing 3. Non-reentrant version of strToUpper
You can implement the reentrant version of this function by changing the prototype of the function. This listing provides storage for the output string: 通过修改函数原型,可实现该函数的可重入版本。下面的清单为输出字符串提供存储空间:
char *strToUpper_r(char *in_str, char *out_str)
{
int index; for (index = ; in_str[index] != '\0'; index++)
out_str[index] = toupper(in_str[index]);
out_str[index] = '\0'; return out_str;
}
Listing 4. Reentrant version of strToUpper
Providing output storage by the calling function ensures the reentrancy of the function. Note that this follows a standard convention for the naming of reentrant function by suffixing the function name with "_r". 由调用方(caller)函数提供输出存储空间可保证函数可重入性。注意,此处遵循标准惯例,通过向函数名添加"_r"后缀来命名可重入函数。
Practice 2
Remembering the state of the data makes the function non-reentrant. Different threads can successively call the function and modify the data without informing the other threads that are using the data. If a function needs to maintain the state of some data over successive calls, such as a working buffer or a pointer, the caller should provide this data. 记忆数据的状态会使函数不可重入。不同线程可能会相继调用该函数,且修改那些数据时不会通知其他正在使用此数据的线程。若函数需要在连续调用期间维持某些数据的状态,如工作缓存或指针,则调用者应该提供该数据。
In the following example, a function returns the successive lowercase characters of a string. The string is provided only on the first call, as with the strtok subroutine. The function returns \0 when it reaches the end of the string. The function could be implemented as follows: 在以下示例中,函数返回字符串中的连续小写字母。该字符串仅在第一次调用时提供,类似strtok子例程。当遍历至字符串末尾时,函数返回'\0'。函数可能实现如下:
char getLowercaseChar(char *str)
{
static char *buffer;
static int index;
char c = '\0';
/* stores the working string on first call only */
if (string != NULL) {
buffer = str;
index = ;
} /* searches a lowercase character */
while(c= buffer[index]){
if(islower(c)) {
index++;
break;
}
index++;
}
return c;
}
Listing 5. Non-reentrant version of getLowercaseChar
This function is not reentrant, because it stores the state of the variables. To make it reentrant, the static data, the index variable, needs to be maintained by the caller. The reentrant version of the function could be implemented like this: 该函数不可重入,因为它保存变量状态。为使它可重入,静态数据(即index),需由调用者来维护。该函数的可重入版本可能实现如下:
char getLowercaseChar_r(char *str, int *pIndex)
{
char c = '\0'; /* no initialization - the caller should have done it */ /* searches a lowercase character */
while(c = str[*pIndex]){
if(islower(c)){
(*pIndex)++; break;
}
(*pIndex)++;
}
return c;
}
Listing 6. Reentrant version of getLowercaseChar
Practice 3
On most systems, malloc and free are not reentrant, because they use a static data structure that records which memory blocks are free. As a result, no library functions that allocate or free memory are reentrant. This includes functions that allocate space to store a result. 在大部分系统中,malloc和free都不是可重入的,因为它们使用静态数据结构来记录哪些内存块是空闲的。因此,任何分配或释放内存的库函数都是不可重入的。这也包括分配空间以存储结果的函数。
The best way to avoid the need to allocate memory in a handler is to allocate, in advance, space for signal handlers to use. The best way to avoid freeing memory in a handler is to flag or record the objects to be freed and have the program check from time to time whether anything is waiting to be freed. But this must be done with care, because placing an object on a chain is not atomic, and if it is interrupted by another signal handler that does the same thing, you could "lose" one of the objects. However, if you know that the program cannot possibly use the stream that the handler uses at a time when signals can arrive, you are safe. There is no problem if the program uses some other stream. 避免在处理器中分配内存的最好方法是,预先分配信号处理器要使用的内存。避免在处理器中释放内存的最好方法是,标记或记录将要释放的对象,让程序不时地检查是否有等待被释放的内存。但这必须小心进行,因为将一个对象添加到一个链并不是原子操作,若它被另一个做同样动作的信号处理器中断,则会"丢失"一个对象。然而,若知道当信号可能到达时,程序不可能使用处理器此刻所使用的流,那么就是安全的。若程序使用的是某些其他流,那么也不会有任何问题。
Practice 4
To write bug-free code, practice care in handling process-wide global variables like errno and h_errno. Consider the following code: 为编写无缺陷代码,要小心处理进程范围内的全局变量,如errno和h_errno。考虑下面的代码:
if (close(fd) < ) {
fprintf(stderr, "Error in close, errno: %d", errno);
exit();
}
Listing 7. Risky use of errno
Suppose a signal is generated during the very small time gap between setting the errno variable by the close system call and its return. The generated signal can change the value of errno, and the program behaves unexpectedly. 假定信号在close系统调用设置errno变量到其返回之前这一极小的时间空隙内产生。该信号可能会改变errno的值,程序的行为会无法预料。
Saving and restoring the value of errno in the signal handler, as follows, can resolve the problem: 如下,在信号处理器内保存和恢复errno的值,可解决这一问题:
void signalHandler(int signo){
int errno_saved; /* Save the error no. */
errno_saved = errno; /* Let the signal handler complete its job */
//...
//... /* Restore the errno*/
errno = errno_saved;
}
Listing 8. Saving and restoring the value of errno
Practice 5
If the underlying function is in the middle of a critical section and a signal is generated and handled, this can cause the function to be non-reentrant. By using signal sets and a signal mask, the critical region of code can be protected from a specific set of signals, as follows: 若底层函数正处于关键部分,且生成并处理信号,则可能导致函数不可重入。通过使用信号集和信号掩码,代码的关键区域可被保护起来不受一组特定信号的影响,如下:
- Save the current set of signals. 保存当前信号集。
- Mask the signal set with the unwanted signals. 屏蔽信号集中不需要的信号。
- Let the critical section of code complete its job. 使代码的关键部分完成其工作。
- Finally, reset the signal set. 最后,重置信号集。
Here is an outline of this practice: 以下是该实践的概要:
sigset_t newmask, oldmask, zeromask;
...
/* Register the signal handler */
signal(SIGALRM, sig_handler); /* Initialize the signal sets */
sigemtyset(&newmask); sigemtyset(&zeromask); /* Add the signal to the set */
sigaddset(&newmask, SIGALRM); /* Block SIGALRM and save current signal mask in set variable 'oldmask'
*/
sigprocmask(SIG_BLOCK, &newmask, &oldmask); /* The protected code goes here
...
...
*/ /* Now allow all signals and pause */
sigsuspend(&zeromask); /* Resume to the original signal mask */
sigprocmask(SIG_SETMASK, &oldmask, NULL); /* Continue with other parts of the code */
Listing 9. Using signal sets and signal masks
Skipping sigsuspend(&zeromask); can cause a problem. There has to be some gap of clock cycles between the unblocking of signals and the next instruction carried by the process, and any occurrence of a signal in this window of time is lost. The function call sigsuspend resolves this problem by resetting the signal mask and putting the process to sleep in a single atomic operation. If you are sure that signal generation in this window of time won't have any adverse effects, you can skip sigsuspend and go directly to resetting the signal. 跳过sigsuspend(&zeromask);语句可能会引发问题。从消除信号阻塞到进程执行下条指令之间需要一些时钟周期间隙,任何在此时间窗内发生的信号都会丢失。函数调用sigsuspend通过重置信号掩码并使进程休眠一个单一原子操作来解决该问题。若能确保在此时间窗内生成信号不会有任何负面影响,则可跳过sigsuspend直接重设信号。
Dealing with reentrancy at the compiler level
I would like to propose a model for dealing with reentrant functions at the compiler level. A new keyword, reentrant, can be introduced for the high-level language, and functions can be given a reentrant specifier that will ensure that the functions are reentrant, like so: 我将提出一个在编译器层次处理可重入函数的模型。可为高级语言引入一个新的关键字reentrant,函数可被指定一个reentrant 标识符,以确保函数可重入,比如:
reentrant int foo(); |
This directive instructs the compiler to give special treatment to that particular function. The compiler can store this directive in its symbol table and use it during the intermediate code generation phase. To accomplish this, some design changes are required in the compiler's front end. This reentrant specifier follows these guidelines: 该指示符告知编译器对特定函数进行特殊处理。编译器可将该指示符存储在它的符号表中,并在中间代码生成阶段使用该指示符。为达到该目的,编译器的前端设计需要有一些改变。可重入指示符遵循这些准则:
- Does not hold static data over successive calls不为连续的调用保持静态数据
- Protects global data by making a local copy of it通过制作全局数据的本地拷贝来保护全局数据
- Must not call non-reentrant functions绝不调用不可重入的函数
- Does not return a reference to static data, and all data is provided by the caller of the function不返回对静态数据的引用,所有数据均由函数调用者提供
Guideline 1 can be ensured by type checking and throwing an error message if there is any static storage declaration in the function. This can be done during the semantic analysis phase of the compilation. 准则1可通过类型检查得到保证,若函数中有任何静态存储声明,则抛出错误消息。这可在编译的语法分析阶段完成。
Guideline 2, protection of global data, can be ensured in two ways. The primitive way is by throwing an error message if the function modifies global data. A more sophisticated technique is to generate intermediate code in such a way that the global data doesn't get mangled. An approach similar to Practice 4, above, can be implemented at the compiler level. On entering the function, the compiler can store the to-be-manipulated global data using a compiler-generated temporary name, then restore the data upon exiting the function. Storing data using a compiler-generated temporary name is normal practice for the compiler. 准则2,全局数据的保护可通过两种方式得到保证。基本方法是,若函数修改全局数据,则抛出一个错误消息。更为复杂的技术是以全局数据不被破坏的方式生成中间代码。可在编译器层面实现类似于前面实践4的方法。在进入函数时,编译器可使用其生成的临时名称存储待操作的全局数据,然后在退出函数时恢复该数据。使用编译器生成的临时名称存储数据对编译器而言是普遍的做法。
Ensuring guideline 3 requires the compiler to have prior knowledge of all the reentrant functions, including the libraries used by the application. This additional information about the function can be stored in the symbol table. 确保准则3要求编译器预先知道所有可重入函数,包括应用程序所使用的程序库。这些关于函数的额外信息可存储在符号表中。
Finally, guideline 4 is already guaranteed by guideline 1. There is no question of returning a reference to static data if the function doesn't have one. 最后,准则4已得到准则1的保证。若函数没有静态数据,也就不存在返回静态数据引用的问题。
This proposed model would make the programmer's job easier in following the guidelines for reentrant functions, and by using this model, code would be protected against the unintentional reentrancy bug. 所提出的这个模型将简化程序员遵循可重入函数准则的工作,而且使用该模型可以预防代码出现无意的可重入性缺陷。
Resources
You can read or download IEEE Std 1003.1 from unix.org, a Web site of The Open Group (registration is required to view or download the document).
Starting with Synchronization is not the enemy (developerWorks, July 2001), this series of three articles covers issues of threading and concurrency when programming in the Java? language.
PowerPC developers will appreciate the insights presented in Save your code from meltdown using PowerPC atomic instructions (developerWorks, November 2004); it describes techniques for safe concurrent programming in PowerPC assembly language.
Good background for UNIX programmers includes UNIX Network Programming by W. Richard Stevens and Design of the Unix Operating System by Maurice J. Bach.
Find more resources for Linux developers in the developerWorks Linux zone.
Get involved in the developerWorks community by participating in developerWorks blogs.
Browse for books on these and other technical topics.