cpu cache丢失率对程序性能影响实验

一直对cpu cache对程序性能的影响没有什么直观的感觉,现在利用time命令以及valgrind cachegrind工具来做个测试,可以真切的感受到cpu cache对程序性能的影响。从而帮助优化程序。

1. 经典测试代码

cache1.c

     1	#include <stdio.h>
       
     2	#define MAXROW 8000
     3	#define MAXCOL 8000
     4	int main(int argc, char **argv)
     5	{
     6		int i,j;
     7		static x[MAXROW][MAXCOL];
     8		printf("starting\n");
     9		for(i = 0; i < MAXROW; i++){
    10			for(j = 0; j < MAXCOL; j++){
    11				x[i][j] = i * j;
    12			}
    13		}
    14		printf("complete\n");
    15	}

cache2.c

     1	#include <stdio.h>
       
     2	#define MAXROW 8000
     3	#define MAXCOL 8000
     4	int main(int argc, char **argv)
     5	{
     6		int i,j;
     7		static x[MAXROW][MAXCOL];
     8		printf("starting\n");
     9		for(j = 0; j < MAXCOL; j++){
    10			for(i = 0; i < MAXROW; i++){
    11				x[i][j] = i * j;
    12			}
    13		}
    14		printf("complete\n");
    15	}

2. 实验结果

[root@192 test]# gcc cache1.c -o cache1
[root@192 test]# time ./cache1 
starting
complete

real	0m0.359s
user	0m0.186s
sys	0m0.126s
[root@192 test]# gcc cache2.c -o cache2
[root@192 test]# time ./cache2 
starting
complete

real	0m2.033s
user	0m1.899s
sys	0m0.116s
[root@192 test]#

从源码可以看出,cache2.c只是更改了一行代码,但是执行时间却有了天壤之别,cache1 359ms就完成了,但是cache2却执行了2秒钟。接下来看一下两个cpu cache的使用情况。

[root@192 test]# valgrind --tool=cachegrind ./cache1
==6269== Cachegrind, a cache and branch-prediction profiler
==6269== Copyright (C) 2002-2017, and GNU GPL‘d, by Nicholas Nethercote et al.
==6269== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==6269== Command: ./cache1
==6269== 
--6269-- warning: L3 cache found, using its data for the LL simulation.
starting
complete
==6269== 
==6269== I   refs:      768,153,385
==6269== I1  misses:            758
==6269== LLi misses:            754
==6269== I1  miss rate:        0.00%
==6269== LLi miss rate:        0.00%
==6269== 
==6269== D   refs:      448,071,036  (384,049,725 rd   + 64,021,311 wr)
==6269== D1  misses:      4,001,857  (      1,306 rd   +  4,000,551 wr)
==6269== LLd misses:      4,001,663  (      1,134 rd   +  4,000,529 wr)
==6269== D1  miss rate:         0.9% (        0.0%     +        6.2%  )
==6269== LLd miss rate:         0.9% (        0.0%     +        6.2%  )
==6269== 
==6269== LL refs:         4,002,615  (      2,064 rd   +  4,000,551 wr)
==6269== LL misses:       4,002,417  (      1,888 rd   +  4,000,529 wr)
==6269== LL miss rate:          0.3% (        0.0%     +        6.2%  )
[root@192 test]# valgrind --tool=cachegrind ./cache2
==6270== Cachegrind, a cache and branch-prediction profiler
==6270== Copyright (C) 2002-2017, and GNU GPL‘d, by Nicholas Nethercote et al.
==6270== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==6270== Command: ./cache2
==6270== 
--6270-- warning: L3 cache found, using its data for the LL simulation.
starting
complete
==6270== 
==6270== I   refs:      768,153,385
==6270== I1  misses:            758
==6270== LLi misses:            754
==6270== I1  miss rate:        0.00%
==6270== LLi miss rate:        0.00%
==6270== 
==6270== D   refs:      448,071,036  (384,049,725 rd   + 64,021,311 wr)
==6270== D1  misses:     64,001,856  (      1,306 rd   + 64,000,550 wr)
==6270== LLd misses:      4,009,662  (      1,134 rd   +  4,008,528 wr)
==6270== D1  miss rate:        14.3% (        0.0%     +      100.0%  )
==6270== LLd miss rate:         0.9% (        0.0%     +        6.3%  )
==6270== 
==6270== LL refs:        64,002,614  (      2,064 rd   + 64,000,550 wr)
==6270== LL misses:       4,010,416  (      1,888 rd   +  4,008,528 wr)
==6270== LL miss rate:          0.3% (        0.0%     +        6.3%  )
[root@192 test]#

从cache丢失率上可以看出,cache1的D1的丢失率%0.9,cache2的D2的丢失率%14.3,因此可以感受到cpu cache的丢失率对程序性能的影响是多么重大。

cpu cache丢失率对程序性能影响实验

上一篇:EFCore——使用EFCore进行增删改查(2)


下一篇:复数类的运算