cache line

2023-11-16 09:39:40

查看Cache的关联方式

在 /sys/devices/system/cpu/中查看相应的文件夹

如查看cpu0 的一级缓存中的有多少组，

$ cat /sys/devices/system/cpu/cpu0/cache/index0/number_of_sets
$64
如查看cpu0的一级缓存中一组中的行数

$cat /sys/devices/system/cpu/cpu0/cache/index0/ways_of_associativity
$8
 三、查看cache_line的大小

上面以本人电脑的cpu一级缓存为例知道了cpu0的一级缓存的大小：32k，其包含64个（sets）组，每组有8(ways),则可以算出每一个way(cache_line)的大小 cache_line = 32*1024/(64*8)=64 bytes。当然我们也可以通过以下命令查出cache_line的大小

$ cat /sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size

[root@localhost demo]#  cat /proc/cpuinfo | grep cache
[root@localhost demo]# cat /sys/devices/system/cpu/cpu0/cache/index0/number_of_sets
256
[root@localhost demo]# cat /sys/devices/system/cpu/cpu0/cache/index0/ways_of_associativity
4
[root@localhost demo]# cat /sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size
64
[root@localhost demo]#  cat /proc/cpuinfo | grep cache
[root@localhost demo]#

root@zj-x86:~# cat /sys/devices/system/cpu/cpu0/cache/index0/number_of_sets
64
root@zj-x86:~# cat /sys/devices/system/cpu/cpu0/cache/index0/ways_of_associativity
8
root@zj-x86:~# cat /sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size
64
root@zj-x86:~# cat /sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size | more
64
root@zj-x86:~# cat /proc/cpuinfo | grep cache | more
cache size      : 25344 KB
cache_alignment : 64
cache size      : 25344 KB
cache_alignment : 64
cache size      : 25344 KB
cache_alignment : 64
cache size      : 25344 KB
cache_alignment : 64

struct rte_ring {
    /*
     * Note: this field kept the RTE_MEMZONE_NAMESIZE size due to ABI
     * compatibility requirements, it could be changed to RTE_RING_NAMESIZE
     * next time the ABI changes
     */
    char name[RTE_MEMZONE_NAMESIZE] __rte_cache_aligned; /**< Name of the ring. */
    int flags;               /**< Flags supplied at creation. */
    const struct rte_memzone *memzone;
            /**< Memzone, if any, containing the rte_ring */
    uint32_t size;           /**< Size of ring. */
    uint32_t mask;           /**< Mask (size-1) of ring. */
    uint32_t capacity;       /**< Usable size of ring */
 
    char pad0 __rte_cache_aligned; /**< empty cache line */
 
    /** Ring producer status. */
    struct rte_ring_headtail prod __rte_cache_aligned;
    char pad1 __rte_cache_aligned; /**< empty cache line */
 
    /** Ring consumer status. */
    struct rte_ring_headtail cons __rte_cache_aligned;
    char pad2 __rte_cache_aligned; /**< empty cache line */
};
 
struct rte_ring结构体主要包含一个生产者prod和一个消费者cons，还有ring本身支持加入obj数量的容量大小，
这个过程struct rte_ring、struct rte_ring_headtail都设置了cache line对其，防止出现cache miss的情况.

struct rte_ring_headtail {
    volatile uint32_t head;  /**< Prod/consumer head. */
    volatile uint32_t tail;  /**< Prod/consumer tail. */
    uint32_t single;         /**< True if single prod/cons */
};
rte_ring_headtail 实现了head和tail，环形链表两个游标，还有一个single，标示是单操作者还是多操作者.

Cacheline 优化

原理

CPU标识Cache中的数据是否为有效数据不是以内存位宽为单位，而是以Cacheline为单位。这个机制可能会导致伪共享（false sharing）现象，从而使得CPU的Cache命中率变低。出现伪共享的常见原因是高频访问的数据未按照Cacheline大小对齐。

Cache空间大小划分成不同的Cacheline，示意图如图1所示。readHighFreq虽然没有被改写，且在Cache中，在发生伪共享时，也是从内存中读。

图1 Cache空间大小划分

例如以下代码定义两个变量，会在同一个Cacheline中，Cache会同时读入：

int readHighFreq, writeHighFreq

其中readHighFreq是读频率高的变量，writeHighFreq为写频率高的变量。writeHighFreq在一个CPU core里面被改写后，这个cache 中对应的Cacheline长度的数据被标识为无效，也就是readHighFreq被CPU core标识为无效数据，虽然readHighFreq并没有被修改，但是CPU在访问readHighFreq时，依然会从内存重新导入，出现伪共享导致性能降低。

鲲鹏920和x86的Cacheline大小不一致，可能会出现在X86上优化好的程序在鲲鹏 920上运行时的性能偏低的情况，需要重新修改业务代码数据内存对齐大小。X86 L3 cache的Cacheline大小为64字节，鲲鹏920的Cacheline为128字节。

修改方式

修改业务代码，使得读写频繁的数据以Cacheline大小对齐，修改方法可参考：
1. 使用动态申请内存的对齐方法：
  int posix_memalign(void **memptr, size_t alignment, size_t size)
  
  调用posix_memalign函数成功时会返回size字节的动态内存，并且这块内存的起始地址是alignment的倍数。
2. 局部变量可以采用填充的方式：
  int writeHighFreq;
  
  char pad[CACHE_LINE_SIZE - sizeof(int)];
  
  代码中CACHE_LINE_SIZE是服务器Cacheline的大小，pad变量没有用处，用于填充writeHighFreq变量余下的空间，两者之和是CacheLine大小。
部分开源软件代码中有Cacheline的宏定义，修改宏的值即可。如在impala使用CACHE_LINE_SIZE宏来表示目标平台的Cacheline大小。

cache line

码农公寓

Cacheline 优化

原理

修改方式

相关文章