一直对cpu cache对程序性能的影响没有什么直观的感觉,现在利用time命令以及valgrind cachegrind工具来做个测试,可以真切的感受到cpu cache对程序性能的影响。从而帮助优化程序。
1. 经典测试代码
cache1.c
1 #include <stdio.h>
2 #define MAXROW 8000
3 #define MAXCOL 8000
4 int main(int argc, char **argv)
5 {
6 int i,j;
7 static x[MAXROW][MAXCOL];
8 printf("starting\n");
9 for(i = 0; i < MAXROW; i++){
10 for(j = 0; j < MAXCOL; j++){
11 x[i][j] = i * j;
12 }
13 }
14 printf("complete\n");
15 }
cache2.c
1 #include <stdio.h>
2 #define MAXROW 8000
3 #define MAXCOL 8000
4 int main(int argc, char **argv)
5 {
6 int i,j;
7 static x[MAXROW][MAXCOL];
8 printf("starting\n");
9 for(j = 0; j < MAXCOL; j++){
10 for(i = 0; i < MAXROW; i++){
11 x[i][j] = i * j;
12 }
13 }
14 printf("complete\n");
15 }
2. 实验结果
[root@192 test]# gcc cache1.c -o cache1
[root@192 test]# time ./cache1
starting
complete
real 0m0.359s
user 0m0.186s
sys 0m0.126s
[root@192 test]# gcc cache2.c -o cache2
[root@192 test]# time ./cache2
starting
complete
real 0m2.033s
user 0m1.899s
sys 0m0.116s
[root@192 test]#
从源码可以看出,cache2.c只是更改了一行代码,但是执行时间却有了天壤之别,cache1 359ms就完成了,但是cache2却执行了2秒钟。接下来看一下两个cpu cache的使用情况。
[root@192 test]# valgrind --tool=cachegrind ./cache1
==6269== Cachegrind, a cache and branch-prediction profiler
==6269== Copyright (C) 2002-2017, and GNU GPL‘d, by Nicholas Nethercote et al.
==6269== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==6269== Command: ./cache1
==6269==
--6269-- warning: L3 cache found, using its data for the LL simulation.
starting
complete
==6269==
==6269== I refs: 768,153,385
==6269== I1 misses: 758
==6269== LLi misses: 754
==6269== I1 miss rate: 0.00%
==6269== LLi miss rate: 0.00%
==6269==
==6269== D refs: 448,071,036 (384,049,725 rd + 64,021,311 wr)
==6269== D1 misses: 4,001,857 ( 1,306 rd + 4,000,551 wr)
==6269== LLd misses: 4,001,663 ( 1,134 rd + 4,000,529 wr)
==6269== D1 miss rate: 0.9% ( 0.0% + 6.2% )
==6269== LLd miss rate: 0.9% ( 0.0% + 6.2% )
==6269==
==6269== LL refs: 4,002,615 ( 2,064 rd + 4,000,551 wr)
==6269== LL misses: 4,002,417 ( 1,888 rd + 4,000,529 wr)
==6269== LL miss rate: 0.3% ( 0.0% + 6.2% )
[root@192 test]# valgrind --tool=cachegrind ./cache2
==6270== Cachegrind, a cache and branch-prediction profiler
==6270== Copyright (C) 2002-2017, and GNU GPL‘d, by Nicholas Nethercote et al.
==6270== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==6270== Command: ./cache2
==6270==
--6270-- warning: L3 cache found, using its data for the LL simulation.
starting
complete
==6270==
==6270== I refs: 768,153,385
==6270== I1 misses: 758
==6270== LLi misses: 754
==6270== I1 miss rate: 0.00%
==6270== LLi miss rate: 0.00%
==6270==
==6270== D refs: 448,071,036 (384,049,725 rd + 64,021,311 wr)
==6270== D1 misses: 64,001,856 ( 1,306 rd + 64,000,550 wr)
==6270== LLd misses: 4,009,662 ( 1,134 rd + 4,008,528 wr)
==6270== D1 miss rate: 14.3% ( 0.0% + 100.0% )
==6270== LLd miss rate: 0.9% ( 0.0% + 6.3% )
==6270==
==6270== LL refs: 64,002,614 ( 2,064 rd + 64,000,550 wr)
==6270== LL misses: 4,010,416 ( 1,888 rd + 4,008,528 wr)
==6270== LL miss rate: 0.3% ( 0.0% + 6.3% )
[root@192 test]#
从cache丢失率上可以看出,cache1的D1的丢失率%0.9,cache2的D2的丢失率%14.3,因此可以感受到cpu cache的丢失率对程序性能的影响是多么重大。