epoll是如何监控多个描述符及如何获得通知(2)

作者:gfree.wind@gmail.com
博客:blog.focus-linux.net   linuxfocus.blog.chinaunix.net
 
微博:weibo.com/glinuxer
QQ技术群:4367710
 
本文的copyleft归gfree.wind@gmail.com所有,使用GPL发布,可以*拷贝,转载。但转载请保持文档的完整性,注明原作者及原链接,严禁用于任何商业用途。

========================================================================================================

上文书说到,epoll是如何加到每个监控描述符的wait queue中,这只是第一步。上次也提过,epoll实际上也是一个阻塞操作,只不过是可以同时监控多个文件描述符。下面看一下epoll_wait->ep_poll的实现。

epoll既然是阻塞的,必然需要wait queue。但是这个不能使用监控的文件描述符的wait queue,epoll自己本身也是一个虚拟的文件系统。epoll_create的返回值也是一个文件描述符。Unix下,一切皆是文件嘛。

所以epoll的实现代码如下:

  1.         init_waitqueue_entry(&wait, current);
  2.         __add_wait_queue_exclusive(&ep->wq, &wait);

  3.         for (;;) {
  4.             /*
  5.              * We don't want to sleep if the ep_poll_callback() sends us
  6.              * a wakeup in between. That's why we set the task state
  7.              * to TASK_INTERRUPTIBLE before doing the checks.
  8.              */
  9.             set_current_state(TASK_INTERRUPTIBLE);
  10.             if (ep_events_available(ep) || timed_out)
  11.                 break;
  12.             if (signal_pending(current)) {
  13.                 res = -EINTR;
  14.                 break;
  15.             }

  16.             spin_unlock_irqrestore(&ep->lock, flags);
  17.             if (!schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS))
  18.                 timed_out = 1;

  19.             spin_lock_irqsave(&ep->lock, flags);
  20.         }
  21.         __remove_wait_queue(&ep->wq, &wait);
这里epoll_wait是将当前进程添加到epoll自身的wait queue中。那么问题来了,前文说到epoll已经将当前进程加到了各个监控描述符的wait queue中。现在这里又有了一个epoll自身的wait queue。这是为什么呢?
回答这个问题,需要我们再跳回ep_ptable_queue_proc——不记得这个函数的同学,请翻看前面的文章。这个函数调用init_waitqueue_func_entry(&pwq->wait, ep_poll_callback);,将epoll当前进程的wait queue节点的回调函数设置为ep_poll_callback。对比epoll调用的init_waitqueue_entry函数,这个函数设置wait queue节点的回调函数为default_wake_function。

那么当监控文件描述符执行wakeup动作时,比如一个socket收到数据时,调用sk_data_ready->sock_def_readable->wake_up_interruptible_sync_poll->....最终会执行wait_queue节点的回调函数。对于epoll来说,即ep_poll_callback。

  1. static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *key)
  2. {
  3.     int pwake = 0;
  4.     unsigned long flags;
  5.     struct epitem *epi = ep_item_from_wait(wait);
  6.     struct eventpoll *ep = epi->ep;

  7.     spin_lock_irqsave(&ep->lock, flags);

  8.     /*
  9.      * If the event mask does not contain any poll(2) event, we consider the
  10.      * descriptor to be disabled. This condition is likely the effect of the
  11.      * EPOLLONESHOT bit that disables the descriptor when an event is received,
  12.      * until the next EPOLL_CTL_MOD will be issued.
  13.      */
  14.     if (!(epi->event.events & ~EP_PRIVATE_BITS))
  15.         goto out_unlock;

  16.     /*
  17.      * Check the events coming with the callback. At this stage, not
  18.      * every device reports the events in the "key" parameter of the
  19.      * callback. We need to be able to handle both cases here, hence the
  20.      * test for "key" != NULL before the event match test.
  21.      */
  22.     if (key && !((unsigned long) key & epi->event.events))
  23.         goto out_unlock;

  24.     /*
  25.      * If we are transferring events to userspace, we can hold no locks
  26.      * (because we're accessing user memory, and because of linux f_op->poll()
  27.      * semantics). All the events that happen during that period of time are
  28.      * chained in ep->ovflist and requeued later on.
  29.      */
  30.     if (unlikely(ep->ovflist != EP_UNACTIVE_PTR)) {
  31.         if (epi->next == EP_UNACTIVE_PTR) {
  32.             epi->next = ep->ovflist;
  33.             ep->ovflist = epi;
  34.         }
  35.         goto out_unlock;
  36.     }

  37.     /* If this file is already in the ready list we exit soon */
  38.     if (!ep_is_linked(&epi->rdllink))
  39.         list_add_tail(&epi->rdllink, &ep->rdllist);

  40.     /*
  41.      * Wake up ( if active ) both the eventpoll wait list and the ->poll()
  42.      * wait list.
  43.      */
  44.     if (waitqueue_active(&ep->wq))
  45.         wake_up_locked(&ep->wq);
  46.     if (waitqueue_active(&ep->poll_wait))
  47.         pwake++;

  48. out_unlock:
  49.     spin_unlock_irqrestore(&ep->lock, flags);

  50.     /* We have to call this outside the lock */
  51.     if (pwake)
  52.         ep_poll_safewake(&ep->poll_wait);

  53.     return 1;
  54. }
这个函数的注释相当清楚,可以清晰的知道每一行代码的用途。其中

  1. if (waitqueue_active(&ep->wq))
  2.         wake_up_locked(&ep->wq);
这两行代码,检测了epoll自身的wait queue上是否有等待的节点,如果有的话,就执行唤醒动作。对于epoll的使用者来说,如果用户态正阻塞在epoll_wait中,那么ep->wq一定不为空,这时就会被唤醒。将该进程移到就绪队列中。

这两篇文章基本上理清了epoll如何监控多个描述符及如何获得通知的过程。对于如何监控来说,还欠缺了epoll内部结构,如何保存的各个描述符,如何维护的信息等。不过这样的文章网上已经有了很多。也许以后我会针对这个问题,再写两篇文章吧。


上一篇:微软承认最新 win10 累计更新使电脑运行慢


下一篇:实现Weblogic服务进程监控及自动重启脚本