Affinity broken due to vector space exhaustion 问题

dmesg 中异常打印:

 kernel: irq 632: Affinity broken due to vector space exhaustion.
 kernel: irq 633: Affinity broken due to vector space exhaustion.

这个打印并不是申请不到中断号,而是已经申请到了中断号,但是配置中断路由的时候,
想要生效的中断绑核与预期不一致,代码为:

commit 743dac494d61d991967ebcfab92e4f80dc7583b3
Author: Neil Horman <nhorman@tuxdriver.com>
Date:   Thu Aug 22 10:34:21 2019 -0400

    x86/apic/vector: Warn when vector space exhaustion breaks affinity

    On x86, CPUs are limited in the number of interrupts they can have affined
    to them as they only **support 256 interrupt** vectors per CPU. 32 vectors are
    reserved for the CPU and the kernel reserves another 22 for internal
    purposes. That leaves 202 vectors for assignement to devices.

    When an interrupt is set up or the affinity is changed by the kernel or the
    administrator, the vector assignment code attempts to honor the requested
    affinity mask. If the vector space on the CPUs in that affinity mask is
    exhausted the code falls back to a wider set of CPUs and assigns a vector
    on a CPU outside of the requested affinity mask silently.

    While the effective affinity is reflected in the corresponding
    /proc/irq/$N/effective_affinity* files the silent breakage of the requested
    affinity can lead to unexpected behaviour for administrators.

    Add a pr_warn() when this happens so that adminstrators get at least
    informed about it in the syslog.

    [ tglx: Massaged changelog and made the pr_warn() more informative ]

    Reported-by: djuran@redhat.com
    Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: djuran@redhat.com
    Link: https://lkml.kernel.org/r/20190822143421.9535-1-nhorman@tuxdriver.com

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index fdacb864c3dd..2c5676b0a6e7 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -398,6 +398,17 @@ static int activate_reserved(struct irq_data *irqd)
                if (!irqd_can_reserve(irqd))
                        apicd->can_reserve = false;
        }
+
+       /*
+        * Check to ensure that the effective affinity mask is a subset
+        * the user supplied affinity mask, and warn the user if it is not
+        */
+       if (!cpumask_subset(irq_data_get_effective_affinity_mask(irqd),
+                           irq_data_get_affinity_mask(irqd))) {
+               pr_warn("irq %u: Affinity broken due to vector space exhaustion.\n",
+                       irqd->irq);
+       }
+
        return ret;
 }

原因作者也解释得很清楚,就是x86的cpu,各个核能够接收的中断个数是有限制的,在centos7中,我们经常遇到配置中断路由失败的情况,没有异常打印,
所以针对这个问题,目前在内核中增加了这个打印。然后centos 8.3也移植了这个打印。

遇到这个问题,由于我们0号核一般是重灾区,所以要尽量将中断不要路由到0号核。

上一篇:回忆录


下一篇:转置卷积(Transposed Convolution)