epoll使用详解(精髓) - Boblim - 博客园 https://www.cnblogs.com/fnlingnzb-learner/p/5835573.html
epoll使用详解(精髓)
epoll - I/O event notification facility
在linux的网络编程中,很长的时间都在使用select来做事件触发。在linux新的内核中,有了一种替换它的机制,就是epoll。
相比于select,epoll最大的好处在于它不会随着监听fd数目的增长而降低效率。因为在内核中的select实现中,它是采用轮询来处理的,轮询的fd数目越多,自然耗时越多。并且,在linux/posix_types.h头文件有这样的声明:
#define __FD_SETSIZE 1024
表示select最多同时监听1024个fd,当然,可以通过修改头文件再重编译内核来扩大这个数目,但这似乎并不治本。
epoll的接口非常简单,一共就三个函数:
1. int epoll_create(int size);
创建一个epoll的句柄,size用来告诉内核这个监听的数目一共有多大。这个参数不同于select()中的第一个参数,给出最大监听的fd+1的值。需要注意的是,当创建好epoll句柄后,它就是会占用一个fd值,在linux下如果查看/proc/进程id/fd/,是能够看到这个fd的,所以在使用完epoll后,必须调用close()关闭,否则可能导致fd被耗尽。
2. int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
epoll的事件注册函数,它不同与select()是在监听事件时告诉内核要监听什么类型的事件,而是在这里先注册要监听的事件类型。第一个参数是epoll_create()的返回值,第二个参数表示动作,用三个宏来表示:
EPOLL_CTL_ADD:注册新的fd到epfd中;
EPOLL_CTL_MOD:修改已经注册的fd的监听事件;
EPOLL_CTL_DEL:从epfd中删除一个fd;
第三个参数是需要监听的fd,第四个参数是告诉内核需要监听什么事,struct epoll_event结构如下:
typedef union epoll_data {
void *ptr;
int fd;
__uint32_t u32;
__uint64_t u64;
} epoll_data_t; struct epoll_event {
__uint32_t events; /* Epoll events */
epoll_data_t data; /* User data variable */
};
events可以是以下几个宏的集合:
EPOLLIN :表示对应的文件描述符可以读(包括对端SOCKET正常关闭);
EPOLLOUT:表示对应的文件描述符可以写;
EPOLLPRI:表示对应的文件描述符有紧急的数据可读(这里应该表示有带外数据到来);
EPOLLERR:表示对应的文件描述符发生错误;
EPOLLHUP:表示对应的文件描述符被挂断;
EPOLLET: 将EPOLL设为边缘触发(Edge Triggered)模式,这是相对于水平触发(Level Triggered)来说的。
EPOLLONESHOT:只监听一次事件,当监听完这次事件之后,如果还需要继续监听这个socket的话,需要再次把这个socket加入到EPOLL队列里
3. int epoll_wait(int epfd, struct epoll_event * events, int maxevents, int timeout);
等待事件的产生,类似于select()调用。参数events用来从内核得到事件的集合,maxevents告之内核这个events有多大,这个 maxevents的值不能大于创建epoll_create()时的size,参数timeout是超时时间(毫秒,0会立即返回,-1将不确定,也有说法说是永久阻塞)。该函数返回需要处理的事件数目,如返回0表示已超时。
4、关于ET、LT两种工作模式:
可以得出这样的结论:
ET模式仅当状态发生变化的时候才获得通知,这里所谓的状态的变化并不包括缓冲区中还有未处理的数据,也就是说,如果要采用ET模式,需要一直read/write直到出错为止,很多人反映为什么采用ET模式只接收了一部分数据就再也得不到通知了,大多因为这样;而LT模式是只要有数据没有处理就会一直通知下去的.
那么究竟如何来使用epoll呢?其实非常简单。
通过在包含一个头文件#include <sys/epoll.h> 以及几个简单的API将可以大大的提高你的网络服务器的支持人数。
首先通过create_epoll(int maxfds)来创建一个epoll的句柄,其中maxfds为你epoll所支持的最大句柄数。这个函数会返回一个新的epoll句柄,之后的所有操作将通过这个句柄来进行操作。在用完之后,记得用close()来关闭这个创建出来的epoll句柄。
之后在你的网络主循环里面,每一帧的调用epoll_wait(int epfd, epoll_event events, int max events, int timeout)来查询所有的网络接口,看哪一个可以读,哪一个可以写了。基本的语法为:
nfds = epoll_wait(kdpfd, events, maxevents, -1);
其中kdpfd为用epoll_create创建之后的句柄,events是一个epoll_event*的指针,当epoll_wait这个函数操作成功之后,epoll_events里面将储存所有的读写事件。max_events是当前需要监听的所有socket句柄数。最后一个timeout是 epoll_wait的超时,为0的时候表示马上返回,为-1的时候表示一直等下去,直到有事件范围,为任意正整数的时候表示等这么长的时间,如果一直没有事件,则范围。一般如果网络主循环是单独的线程的话,可以用-1来等,这样可以保证一些效率,如果是和主逻辑在同一个线程的话,则可以用0来保证主循环的效率。
epoll_wait范围之后应该是一个循环,遍利所有的事件。
几乎所有的epoll程序都使用下面的框架:
for( ; ; )
{
nfds = epoll_wait(epfd,events,,);
for(i=;i<nfds;++i)
{
if(events[i].data.fd==listenfd) //有新的连接
{
connfd = accept(listenfd,(sockaddr *)&clientaddr, &clilen); //accept这个连接
ev.data.fd=connfd;
ev.events=EPOLLIN|EPOLLET;
epoll_ctl(epfd,EPOLL_CTL_ADD,connfd,&ev); //将新的fd添加到epoll的监听队列中
}
else if( events[i].events&EPOLLIN ) //接收到数据,读socket
{
n = read(sockfd, line, MAXLINE)) < //读
ev.data.ptr = md; //md为自定义类型,添加数据
ev.events=EPOLLOUT|EPOLLET;
epoll_ctl(epfd,EPOLL_CTL_MOD,sockfd,&ev);//修改标识符,等待下一个循环时发送数据,异步处理的精髓
}
else if(events[i].events&EPOLLOUT) //有数据待发送,写socket
{
struct myepoll_data* md = (myepoll_data*)events[i].data.ptr; //取数据
sockfd = md->fd;
send( sockfd, md->ptr, strlen((char*)md->ptr), ); //发送数据
ev.data.fd=sockfd;
ev.events=EPOLLIN|EPOLLET;
epoll_ctl(epfd,EPOLL_CTL_MOD,sockfd,&ev); //修改标识符,等待下一个循环时接收数据
}
else
{
//其他的处理
}
}
}
下面给出一个完整的服务器端例子:
#include <iostream>
#include <sys/socket.h>
#include <sys/epoll.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <errno.h> using namespace std; #define MAXLINE 5
#define OPEN_MAX 100
#define LISTENQ 20
#define SERV_PORT 5000
#define INFTIM 1000 void setnonblocking(int sock)
{
int opts;
opts=fcntl(sock,F_GETFL);
if(opts<0)
{
perror("fcntl(sock,GETFL)");
exit(1);
}
opts = opts|O_NONBLOCK;
if(fcntl(sock,F_SETFL,opts)<0)
{
perror("fcntl(sock,SETFL,opts)");
exit(1);
}
} int main(int argc, char* argv[])
{
int i, maxi, listenfd, connfd, sockfd,epfd,nfds, portnumber;
ssize_t n;
char line[MAXLINE];
socklen_t clilen; if ( 2 == argc )
{
if( (portnumber = atoi(argv[1])) < 0 )
{
fprintf(stderr,"Usage:%s portnumber/a/n",argv[0]);
return 1;
}
}
else
{
fprintf(stderr,"Usage:%s portnumber/a/n",argv[0]);
return 1;
} //声明epoll_event结构体的变量,ev用于注册事件,数组用于回传要处理的事件 struct epoll_event ev,events[20];
//生成用于处理accept的epoll专用的文件描述符 epfd=epoll_create(256);
struct sockaddr_in clientaddr;
struct sockaddr_in serveraddr;
listenfd = socket(AF_INET, SOCK_STREAM, 0);
//把socket设置为非阻塞方式 //setnonblocking(listenfd); //设置与要处理的事件相关的文件描述符 ev.data.fd=listenfd;
//设置要处理的事件类型 ev.events=EPOLLIN|EPOLLET;
//ev.events=EPOLLIN; //注册epoll事件 epoll_ctl(epfd,EPOLL_CTL_ADD,listenfd,&ev);
bzero(&serveraddr, sizeof(serveraddr));
serveraddr.sin_family = AF_INET;
char *local_addr="127.0.0.1";
inet_aton(local_addr,&(serveraddr.sin_addr));//htons(portnumber); serveraddr.sin_port=htons(portnumber);
bind(listenfd,(sockaddr *)&serveraddr, sizeof(serveraddr));
listen(listenfd, LISTENQ);
maxi = 0;
for ( ; ; ) {
//等待epoll事件的发生 nfds=epoll_wait(epfd,events,20,500);
//处理所发生的所有事件 for(i=0;i<nfds;++i)
{
if(events[i].data.fd==listenfd)//如果新监测到一个SOCKET用户连接到了绑定的SOCKET端口,建立新的连接。 {
connfd = accept(listenfd,(sockaddr *)&clientaddr, &clilen);
if(connfd<0){
perror("connfd<0");
exit(1);
}
//setnonblocking(connfd); char *str = inet_ntoa(clientaddr.sin_addr);
cout << "accapt a connection from " << str << endl;
//设置用于读操作的文件描述符 ev.data.fd=connfd;
//设置用于注测的读操作事件 ev.events=EPOLLIN|EPOLLET;
//ev.events=EPOLLIN; //注册ev epoll_ctl(epfd,EPOLL_CTL_ADD,connfd,&ev);
}
else if(events[i].events&EPOLLIN)//如果是已经连接的用户,并且收到数据,那么进行读入。 {
cout << "EPOLLIN" << endl;
if ( (sockfd = events[i].data.fd) < 0)
continue;
if ( (n = read(sockfd, line, MAXLINE)) < 0) {
if (errno == ECONNRESET) {
close(sockfd);
events[i].data.fd = -1;
} else
std::cout<<"readline error"<<std::endl;
} else if (n == 0) {
close(sockfd);
events[i].data.fd = -1;
}
line[n] = '/0';
cout << "read " << line << endl;
//设置用于写操作的文件描述符 ev.data.fd=sockfd;
//设置用于注测的写操作事件 ev.events=EPOLLOUT|EPOLLET;
//修改sockfd上要处理的事件为EPOLLOUT //epoll_ctl(epfd,EPOLL_CTL_MOD,sockfd,&ev); }
else if(events[i].events&EPOLLOUT) // 如果有数据发送 {
sockfd = events[i].data.fd;
write(sockfd, line, n);
//设置用于读操作的文件描述符 ev.data.fd=sockfd;
//设置用于注测的读操作事件 ev.events=EPOLLIN|EPOLLET;
//修改sockfd上要处理的事件为EPOLIN epoll_ctl(epfd,EPOLL_CTL_MOD,sockfd,&ev);
}
}
}
return 0;
}
EPOLL(7) Linux Programmer's Manual EPOLL(7)
NAME top
epoll - I/O event notification facility
SYNOPSIS top
#include <sys/epoll.h>
DESCRIPTION top
The epoll API performs a similar task to poll(2): monitoring multiple
file descriptors to see if I/O is possible on any of them. The epoll
API can be used either as an edge-triggered or a level-triggered
interface and scales well to large numbers of watched file
descriptors. The following system calls are provided to create and
manage an epoll instance: * epoll_create(2) creates a new epoll instance and returns a file
descriptor referring to that instance. (The more recent
epoll_create1(2) extends the functionality of epoll_create(2).) * Interest in particular file descriptors is then registered via
epoll_ctl(2). The set of file descriptors currently registered on
an epoll instance is sometimes called an epoll set. * epoll_wait(2) waits for I/O events, blocking the calling thread if
no events are currently available. Level-triggered and edge-triggered
The epoll event distribution interface is able to behave both as
edge-triggered (ET) and as level-triggered (LT). The difference
between the two mechanisms can be described as follows. Suppose that
this scenario happens: 1. The file descriptor that represents the read side of a pipe (rfd)
is registered on the epoll instance. 2. A pipe writer writes 2 kB of data on the write side of the pipe. 3. A call to epoll_wait(2) is done that will return rfd as a ready
file descriptor. 4. The pipe reader reads 1 kB of data from rfd. 5. A call to epoll_wait(2) is done. If the rfd file descriptor has been added to the epoll interface
using the EPOLLET (edge-triggered) flag, the call to epoll_wait(2)
done in step 5 will probably hang despite the available data still
present in the file input buffer; meanwhile the remote peer might be
expecting a response based on the data it already sent. The reason
for this is that edge-triggered mode delivers events only when
changes occur on the monitored file descriptor. So, in step 5 the
caller might end up waiting for some data that is already present
inside the input buffer. In the above example, an event on rfd will
be generated because of the write done in 2 and the event is consumed
in 3. Since the read operation done in 4 does not consume the whole
buffer data, the call to epoll_wait(2) done in step 5 might block
indefinitely. An application that employs the EPOLLET flag should use nonblocking
file descriptors to avoid having a blocking read or write starve a
task that is handling multiple file descriptors. The suggested way
to use epoll as an edge-triggered (EPOLLET) interface is as follows: i with nonblocking file descriptors; and ii by waiting for an event only after read(2) or write(2)
return EAGAIN. By contrast, when used as a level-triggered interface (the default,
when EPOLLET is not specified), epoll is simply a faster poll(2), and
can be used wherever the latter is used since it shares the same
semantics. Since even with edge-triggered epoll, multiple events can be
generated upon receipt of multiple chunks of data, the caller has the
option to specify the EPOLLONESHOT flag, to tell epoll to disable the
associated file descriptor after the receipt of an event with
epoll_wait(2). When the EPOLLONESHOT flag is specified, it is the
caller's responsibility to rearm the file descriptor using
epoll_ctl(2) with EPOLL_CTL_MOD. Interaction with autosleep
If the system is in autosleep mode via /sys/power/autosleep and an
event happens which wakes the device from sleep, the device driver
will keep the device awake only until that event is queued. To keep
the device awake until the event has been processed, it is necessary
to use the epoll_ctl(2) EPOLLWAKEUP flag. When the EPOLLWAKEUP flag is set in the events field for a struct
epoll_event, the system will be kept awake from the moment the event
is queued, through the epoll_wait(2) call which returns the event
until the subsequent epoll_wait(2) call. If the event should keep
the system awake beyond that time, then a separate wake_lock should
be taken before the second epoll_wait(2) call. /proc interfaces
The following interfaces can be used to limit the amount of kernel
memory consumed by epoll: /proc/sys/fs/epoll/max_user_watches (since Linux 2.6.28)
This specifies a limit on the total number of file descriptors
that a user can register across all epoll instances on the
system. The limit is per real user ID. Each registered file
descriptor costs roughly 90 bytes on a 32-bit kernel, and
roughly 160 bytes on a 64-bit kernel. Currently, the default
value for max_user_watches is 1/25 (4%) of the available low
memory, divided by the registration cost in bytes. Example for suggested usage
While the usage of epoll when employed as a level-triggered interface
does have the same semantics as poll(2), the edge-triggered usage
requires more clarification to avoid stalls in the application event
loop. In this example, listener is a nonblocking socket on which
listen(2) has been called. The function do_use_fd() uses the new
ready file descriptor until EAGAIN is returned by either read(2) or
write(2). An event-driven state machine application should, after
having received EAGAIN, record its current state so that at the next
call to do_use_fd() it will continue to read(2) or write(2) from
where it stopped before. #define MAX_EVENTS 10
struct epoll_event ev, events[MAX_EVENTS];
int listen_sock, conn_sock, nfds, epollfd; /* Code to set up listening socket, 'listen_sock',
(socket(), bind(), listen()) omitted */ epollfd = epoll_create1(0);
if (epollfd == -1) {
perror("epoll_create1");
exit(EXIT_FAILURE);
} ev.events = EPOLLIN;
ev.data.fd = listen_sock;
if (epoll_ctl(epollfd, EPOLL_CTL_ADD, listen_sock, &ev) == -1) {
perror("epoll_ctl: listen_sock");
exit(EXIT_FAILURE);
} for (;;) {
nfds = epoll_wait(epollfd, events, MAX_EVENTS, -1);
if (nfds == -1) {
perror("epoll_wait");
exit(EXIT_FAILURE);
} for (n = 0; n < nfds; ++n) {
if (events[n].data.fd == listen_sock) {
conn_sock = accept(listen_sock,
(struct sockaddr *) &addr, &addrlen);
if (conn_sock == -1) {
perror("accept");
exit(EXIT_FAILURE);
}
setnonblocking(conn_sock);
ev.events = EPOLLIN | EPOLLET;
ev.data.fd = conn_sock;
if (epoll_ctl(epollfd, EPOLL_CTL_ADD, conn_sock,
&ev) == -1) {
perror("epoll_ctl: conn_sock");
exit(EXIT_FAILURE);
}
} else {
do_use_fd(events[n].data.fd);
}
}
} When used as an edge-triggered interface, for performance reasons, it
is possible to add the file descriptor inside the epoll interface
(EPOLL_CTL_ADD) once by specifying (EPOLLIN|EPOLLOUT). This allows
you to avoid continuously switching between EPOLLIN and EPOLLOUT
calling epoll_ctl(2) with EPOLL_CTL_MOD. Questions and answers
Q0 What is the key used to distinguish the file descriptors regis‐
tered in an epoll set? A0 The key is the combination of the file descriptor number and the
open file description (also known as an "open file handle", the
kernel's internal representation of an open file). Q1 What happens if you register the same file descriptor on an epoll
instance twice? A1 You will probably get EEXIST. However, it is possible to add a
duplicate (dup(2), dup2(2), fcntl(2) F_DUPFD) file descriptor to
the same epoll instance. This can be a useful technique for fil‐
tering events, if the duplicate file descriptors are registered
with different events masks. Q2 Can two epoll instances wait for the same file descriptor? If
so, are events reported to both epoll file descriptors? A2 Yes, and events would be reported to both. However, careful pro‐
gramming may be needed to do this correctly. Q3 Is the epoll file descriptor itself poll/epoll/selectable? A3 Yes. If an epoll file descriptor has events waiting, then it
will indicate as being readable. Q4 What happens if one attempts to put an epoll file descriptor into
its own file descriptor set? A4 The epoll_ctl(2) call fails (EINVAL). However, you can add an
epoll file descriptor inside another epoll file descriptor set. Q5 Can I send an epoll file descriptor over a UNIX domain socket to
another process? A5 Yes, but it does not make sense to do this, since the receiving
process would not have copies of the file descriptors in the
epoll set. Q6 Will closing a file descriptor cause it to be removed from all
epoll sets automatically? A6 Yes, but be aware of the following point. A file descriptor is a
reference to an open file description (see open(2)). Whenever a
file descriptor is duplicated via dup(2), dup2(2), fcntl(2)
F_DUPFD, or fork(2), a new file descriptor referring to the same
open file description is created. An open file description con‐
tinues to exist until all file descriptors referring to it have
been closed. A file descriptor is removed from an epoll set only
after all the file descriptors referring to the underlying open
file description have been closed (or before if the file descrip‐
tor is explicitly removed using epoll_ctl(2) EPOLL_CTL_DEL).
This means that even after a file descriptor that is part of an
epoll set has been closed, events may be reported for that file
descriptor if other file descriptors referring to the same under‐
lying file description remain open. Q7 If more than one event occurs between epoll_wait(2) calls, are
they combined or reported separately? A7 They will be combined. Q8 Does an operation on a file descriptor affect the already col‐
lected but not yet reported events? A8 You can do two operations on an existing file descriptor. Remove
would be meaningless for this case. Modify will reread available
I/O. Q9 Do I need to continuously read/write a file descriptor until
EAGAIN when using the EPOLLET flag (edge-triggered behavior) ? A9 Receiving an event from epoll_wait(2) should suggest to you that
such file descriptor is ready for the requested I/O operation.
You must consider it ready until the next (nonblocking)
read/write yields EAGAIN. When and how you will use the file
descriptor is entirely up to you. For packet/token-oriented files (e.g., datagram socket, terminal
in canonical mode), the only way to detect the end of the
read/write I/O space is to continue to read/write until EAGAIN. For stream-oriented files (e.g., pipe, FIFO, stream socket), the
condition that the read/write I/O space is exhausted can also be
detected by checking the amount of data read from / written to
the target file descriptor. For example, if you call read(2) by
asking to read a certain amount of data and read(2) returns a
lower number of bytes, you can be sure of having exhausted the
read I/O space for the file descriptor. The same is true when
writing using write(2). (Avoid this latter technique if you can‐
not guarantee that the monitored file descriptor always refers to
a stream-oriented file.) Possible pitfalls and ways to avoid them
o Starvation (edge-triggered) If there is a large amount of I/O space, it is possible that by try‐
ing to drain it the other files will not get processed causing star‐
vation. (This problem is not specific to epoll.) The solution is to maintain a ready list and mark the file descriptor
as ready in its associated data structure, thereby allowing the
application to remember which files need to be processed but still
round robin amongst all the ready files. This also supports ignoring
subsequent events you receive for file descriptors that are already
ready. o If using an event cache... If you use an event cache or store all the file descriptors returned
from epoll_wait(2), then make sure to provide a way to mark its clo‐
sure dynamically (i.e., caused by a previous event's processing).
Suppose you receive 100 events from epoll_wait(2), and in event #47 a
condition causes event #13 to be closed. If you remove the structure
and close(2) the file descriptor for event #13, then your event cache
might still say there are events waiting for that file descriptor
causing confusion. One solution for this is to call, during the processing of event 47,
epoll_ctl(EPOLL_CTL_DEL) to delete file descriptor 13 and close(2),
then mark its associated data structure as removed and link it to a
cleanup list. If you find another event for file descriptor 13 in
your batch processing, you will discover the file descriptor had been
previously removed and there will be no confusion.
VERSIONS top
The epoll API was introduced in Linux kernel 2.5.44. Support was
added to glibc in version 2.3.2.
CONFORMING TO top
The epoll API is Linux-specific. Some other systems provide similar
mechanisms, for example, FreeBSD has kqueue, and Solaris has
/dev/poll.
NOTES top
The set of file descriptors that is being monitored via an epoll file
descriptor can be viewed via the entry for the epoll file descriptor
in the process's /proc/[pid]/fdinfo directory. See proc(5) for
further details. The kcmp(2) KCMP_EPOLL_TFD operation can be used to test whether a
file descriptor is present in an epoll instance.
SEE ALSO top
epoll_create(2), epoll_create1(2), epoll_ctl(2), epoll_wait(2),
poll(2), select(2)
COLOPHON top
This page is part of release 4.16 of the Linux man-pages project. A
description of the project, information about reporting bugs, and the
latest version of this page, can be found at
https://www.kernel.org/doc/man-pages/. Linux 2017-09-15 EPOLL(7)