作者:gfree.wind@gmail.com
博客:blog.focus-linux.net linuxfocus.blog.chinaunix.net
博客:blog.focus-linux.net linuxfocus.blog.chinaunix.net
微博:weibo.com/glinuxer
QQ技术群:4367710
本文的copyleft归gfree.wind@gmail.com所有,使用GPL发布,可以*拷贝,转载。但转载请保持文档的完整性,注明原作者及原链接,严禁用于任何商业用途。
========================================================================================================
上文书说到,epoll是如何加到每个监控描述符的wait queue中,这只是第一步。上次也提过,epoll实际上也是一个阻塞操作,只不过是可以同时监控多个文件描述符。下面看一下epoll_wait->ep_poll的实现。
epoll既然是阻塞的,必然需要wait queue。但是这个不能使用监控的文件描述符的wait queue,epoll自己本身也是一个虚拟的文件系统。epoll_create的返回值也是一个文件描述符。Unix下,一切皆是文件嘛。
所以epoll的实现代码如下:
- init_waitqueue_entry(&wait, current);
-
__add_wait_queue_exclusive(&ep->wq, &wait);
-
-
for (;;) {
-
/*
-
* We don't want to sleep if the ep_poll_callback() sends us
-
* a wakeup in between. That's why we set the task state
-
* to TASK_INTERRUPTIBLE before doing the checks.
-
*/
-
set_current_state(TASK_INTERRUPTIBLE);
-
if (ep_events_available(ep) || timed_out)
-
break;
-
if (signal_pending(current)) {
-
res = -EINTR;
-
break;
-
}
-
-
spin_unlock_irqrestore(&ep->lock, flags);
-
if (!schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS))
-
timed_out = 1;
-
-
spin_lock_irqsave(&ep->lock, flags);
-
}
- __remove_wait_queue(&ep->wq, &wait);
回答这个问题,需要我们再跳回ep_ptable_queue_proc——不记得这个函数的同学,请翻看前面的文章。这个函数调用init_waitqueue_func_entry(&pwq->wait, ep_poll_callback);,将epoll当前进程的wait queue节点的回调函数设置为ep_poll_callback。对比epoll调用的init_waitqueue_entry函数,这个函数设置wait queue节点的回调函数为default_wake_function。
那么当监控文件描述符执行wakeup动作时,比如一个socket收到数据时,调用sk_data_ready->sock_def_readable->wake_up_interruptible_sync_poll->....最终会执行wait_queue节点的回调函数。对于epoll来说,即ep_poll_callback。
-
static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *key)
-
{
-
int pwake = 0;
-
unsigned long flags;
-
struct epitem *epi = ep_item_from_wait(wait);
-
struct eventpoll *ep = epi->ep;
-
-
spin_lock_irqsave(&ep->lock, flags);
-
-
/*
-
* If the event mask does not contain any poll(2) event, we consider the
-
* descriptor to be disabled. This condition is likely the effect of the
-
* EPOLLONESHOT bit that disables the descriptor when an event is received,
-
* until the next EPOLL_CTL_MOD will be issued.
-
*/
-
if (!(epi->event.events & ~EP_PRIVATE_BITS))
-
goto out_unlock;
-
-
/*
-
* Check the events coming with the callback. At this stage, not
-
* every device reports the events in the "key" parameter of the
-
* callback. We need to be able to handle both cases here, hence the
-
* test for "key" != NULL before the event match test.
-
*/
-
if (key && !((unsigned long) key & epi->event.events))
-
goto out_unlock;
-
-
/*
-
* If we are transferring events to userspace, we can hold no locks
-
* (because we're accessing user memory, and because of linux f_op->poll()
-
* semantics). All the events that happen during that period of time are
-
* chained in ep->ovflist and requeued later on.
-
*/
-
if (unlikely(ep->ovflist != EP_UNACTIVE_PTR)) {
-
if (epi->next == EP_UNACTIVE_PTR) {
-
epi->next = ep->ovflist;
-
ep->ovflist = epi;
-
}
-
goto out_unlock;
-
}
-
-
/* If this file is already in the ready list we exit soon */
-
if (!ep_is_linked(&epi->rdllink))
-
list_add_tail(&epi->rdllink, &ep->rdllist);
-
-
/*
-
* Wake up ( if active ) both the eventpoll wait list and the ->poll()
-
* wait list.
-
*/
-
if (waitqueue_active(&ep->wq))
-
wake_up_locked(&ep->wq);
-
if (waitqueue_active(&ep->poll_wait))
-
pwake++;
-
-
out_unlock:
-
spin_unlock_irqrestore(&ep->lock, flags);
-
-
/* We have to call this outside the lock */
-
if (pwake)
-
ep_poll_safewake(&ep->poll_wait);
-
-
return 1;
- }
-
if (waitqueue_active(&ep->wq))
- wake_up_locked(&ep->wq);
这两篇文章基本上理清了epoll如何监控多个描述符及如何获得通知的过程。对于如何监控来说,还欠缺了epoll内部结构,如何保存的各个描述符,如何维护的信息等。不过这样的文章网上已经有了很多。也许以后我会针对这个问题,再写两篇文章吧。