滴滴云A100 40G+TensorFlow1.15.2 +Ubuntu 18.04 性能测试

今天拿到了滴滴云内测版A100,跑了一下 TensorFlow基准测试,现在把结果记录一下!

 

运行环境

 

平台为:滴滴云

系统为:Ubuntu 18.04

显卡为:A100-SXM4-40GB

Python版本: 3.6

TensorFlow版本:1.15.2 NV编译版

滴滴云A100 40G+TensorFlow1.15.2 +Ubuntu 18.04 性能测试滴滴云A100 40G+TensorFlow1.15.2 +Ubuntu 18.04 性能测试?

 

系统环境:

滴滴云A100 40G+TensorFlow1.15.2 +Ubuntu 18.04 性能测试滴滴云A100 40G+TensorFlow1.15.2 +Ubuntu 18.04 性能测试?

 

测试方法

TensorFlow benchmarks测试方法:

https://github.com/tensorflow/benchmarks

 

resnet50_v1.5

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50_v1.5
Step    Img/sec total_loss
1       images/sec: 602.4 +/- 0.0 (jitter = 0.0)        7.847
10      images/sec: 606.8 +/- 1.2 (jitter = 5.4)        8.053
20      images/sec: 606.3 +/- 0.8 (jitter = 4.4)        8.102
30      images/sec: 605.8 +/- 0.8 (jitter = 3.8)        8.117
40      images/sec: 606.2 +/- 0.7 (jitter = 3.8)        7.893
50      images/sec: 606.1 +/- 0.5 (jitter = 3.0)        7.919
60      images/sec: 606.2 +/- 0.5 (jitter = 2.9)        8.104
70      images/sec: 606.6 +/- 0.5 (jitter = 2.9)        7.985
80      images/sec: 606.6 +/- 0.4 (jitter = 2.8)        7.805
90      images/sec: 606.6 +/- 0.4 (jitter = 2.8)        7.973
100     images/sec: 606.7 +/- 0.4 (jitter = 2.8)        7.644
----------------------------------------------------------------
total images/sec: 606.23
----------------------------------------------------------------

 

--use_fp16

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50_v1.5 --use_fp16

 

Step    Img/sec total_loss
1 images/sec: 1327.1 +/- 0.0 (jitter = 0.0) 7.972
10 images/sec: 1321.2 +/- 5.7 (jitter = 27.6) 7.885
20 images/sec: 1323.5 +/- 4.4 (jitter = 25.9) 8.073
30 images/sec: 1323.6 +/- 3.7 (jitter = 27.3) 7.934
40 images/sec: 1322.1 +/- 3.3 (jitter = 32.9) 8.102
50 images/sec: 1321.4 +/- 3.0 (jitter = 27.7) 7.876
60 images/sec: 1322.2 +/- 2.8 (jitter = 32.3) 7.883
70 images/sec: 1322.3 +/- 2.5 (jitter = 32.6) 7.962
80 images/sec: 1324.0 +/- 2.4 (jitter = 32.2) 8.049
90 images/sec: 1324.2 +/- 2.2 (jitter = 31.2) 7.909
100 images/sec: 1325.1 +/- 2.1 (jitter = 29.6) 7.874
----------------------------------------------------------------
total images/sec: 1322.76
----------------------------------------------------------------

 

 

 

Resnet50 BS64

 

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50
Step    Img/sec total_loss
1 images/sec: 653.5 +/- 0.0 (jitter = 0.0) 8.219
10 images/sec: 646.2 +/- 2.0 (jitter = 6.0) 7.879
20 images/sec: 646.1 +/- 1.4 (jitter = 7.2) 7.909
30 images/sec: 646.0 +/- 1.2 (jitter = 6.0) 7.820
40 images/sec: 646.2 +/- 1.0 (jitter = 6.3) 8.006
50 images/sec: 646.0 +/- 1.0 (jitter = 8.6) 7.769
60 images/sec: 646.0 +/- 0.9 (jitter = 8.6) 8.114
70 images/sec: 645.7 +/- 0.9 (jitter = 9.5) 7.811
80 images/sec: 645.8 +/- 0.8 (jitter = 9.5) 7.979
90 images/sec: 645.8 +/- 0.8 (jitter = 8.0) 8.095
100 images/sec: 645.8 +/- 0.7 (jitter = 6.4) 8.038
----------------------------------------------------------------
total images/sec: 645.26
----------------------------------------------------------------

 

--use_fp16

 

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50 --use_fp16
Step    Img/sec total_loss
1 images/sec: 1300.1 +/- 0.0 (jitter = 0.0) 8.101
10 images/sec: 1310.1 +/- 7.5 (jitter = 7.4) 7.758
20 images/sec: 1309.7 +/- 8.0 (jitter = 42.3) 7.912
30 images/sec: 1315.0 +/- 5.9 (jitter = 32.1) 7.776
40 images/sec: 1315.5 +/- 4.7 (jitter = 28.2) 7.918
50 images/sec: 1317.5 +/- 3.9 (jitter = 27.7) 7.895
60 images/sec: 1316.5 +/- 3.4 (jitter = 18.6) 7.711
70 images/sec: 1317.3 +/- 3.1 (jitter = 16.1) 8.008
80 images/sec: 1316.9 +/- 2.8 (jitter = 11.4) 7.777
90 images/sec: 1317.7 +/- 2.6 (jitter = 11.8) 7.808
100 images/sec: 1317.1 +/- 2.4 (jitter = 9.9) 8.036
----------------------------------------------------------------
total images/sec: 1315.11
----------------------------------------------------------------

 

 

AlexNet BS512

 

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=512 --model=alexnet
Step    Img/sec total_loss
1 images/sec: 8294.2 +/- 0.0 (jitter = 0.0) nan
10 images/sec: 8290.2 +/- 1.6 (jitter = 5.3) nan
20 images/sec: 8290.6 +/- 1.0 (jitter = 3.7) nan
30 images/sec: 8290.8 +/- 0.7 (jitter = 2.8) nan
40 images/sec: 8291.3 +/- 0.6 (jitter = 2.7) nan
50 images/sec: 8289.8 +/- 1.4 (jitter = 2.9) nan
60 images/sec: 8290.2 +/- 1.2 (jitter = 2.9) nan
70 images/sec: 8290.4 +/- 1.3 (jitter = 3.6) nan
80 images/sec: 8291.1 +/- 1.1 (jitter = 3.5) nan
90 images/sec: 8291.9 +/- 1.0 (jitter = 4.4) nan
100 images/sec: 8291.9 +/- 1.1 (jitter = 5.2) nan
----------------------------------------------------------------
total images/sec: 8282.46
----------------------------------------------------------------

--use_fp16

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=512 --model=alexnet --use_fp16
Step    Img/sec total_loss
1 images/sec: 10618.6 +/- 0.0 (jitter = 0.0) 7.250
10 images/sec: 10607.7 +/- 4.4 (jitter = 16.3) 7.251
20 images/sec: 10602.5 +/- 3.0 (jitter = 13.1) 7.251
30 images/sec: 10604.1 +/- 2.3 (jitter = 11.2) 7.251
40 images/sec: 10601.0 +/- 2.5 (jitter = 13.4) 7.251
50 images/sec: 10601.7 +/- 2.5 (jitter = 13.8) 7.251
60 images/sec: 10603.0 +/- 2.2 (jitter = 14.0) 7.250
70 images/sec: 10605.1 +/- 2.1 (jitter = 12.5) 7.251
80 images/sec: 10605.4 +/- 1.9 (jitter = 12.2) 7.251
90 images/sec: 10605.4 +/- 1.7 (jitter = 12.1) 7.251
100 images/sec: 10605.8 +/- 1.7 (jitter = 12.3) 7.251
----------------------------------------------------------------
total images/sec: 10587.67
----------------------------------------------------------------

 

Inception v3 BS64

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=inception3
Step    Img/sec total_loss
1 images/sec: 436.8 +/- 0.0 (jitter = 0.0) 7.276
10 images/sec: 437.9 +/- 1.2 (jitter = 0.8) 7.337
20 images/sec: 437.8 +/- 1.0 (jitter = 2.2) 7.269
30 images/sec: 437.9 +/- 0.8 (jitter = 2.2) 7.422
40 images/sec: 437.9 +/- 0.6 (jitter = 3.5) 7.299
50 images/sec: 438.6 +/- 0.6 (jitter = 4.1) 7.277
60 images/sec: 439.2 +/- 0.5 (jitter = 3.7) 7.363
70 images/sec: 439.5 +/- 0.5 (jitter = 4.8) 7.347
80 images/sec: 440.3 +/- 0.5 (jitter = 5.3) 7.410
90 images/sec: 440.3 +/- 0.5 (jitter = 5.2) 7.325
100 images/sec: 440.3 +/- 0.4 (jitter = 5.0) 7.346
----------------------------------------------------------------
total images/sec: 440.01
----------------------------------------------------------------

 

--use_fp16

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=inception3 --use_fp16
Step    Img/sec total_loss
1 images/sec: 901.5 +/- 0.0 (jitter = 0.0) 7.305
10 images/sec: 945.5 +/- 7.0 (jitter = 5.0) 7.354
20 images/sec: 945.6 +/- 4.9 (jitter = 7.1) 7.330
30 images/sec: 945.3 +/- 3.9 (jitter = 6.9) 7.382
40 images/sec: 946.3 +/- 3.2 (jitter = 7.3) 7.278
50 images/sec: 946.6 +/- 2.8 (jitter = 7.5) 7.373
60 images/sec: 946.3 +/- 2.5 (jitter = 7.6) 7.299
70 images/sec: 946.8 +/- 2.3 (jitter = 7.5) 7.323
80 images/sec: 946.5 +/- 2.1 (jitter = 7.6) 7.317
90 images/sec: 946.6 +/- 2.0 (jitter = 7.6) 7.357
100 images/sec: 947.2 +/- 1.8 (jitter = 7.3) 7.327
----------------------------------------------------------------
total images/sec: 946.03
----------------------------------------------------------------

 

VGG16 BS64

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=vgg16
Step    Img/sec total_loss
1 images/sec: 442.1 +/- 0.0 (jitter = 0.0) 7.321
10 images/sec: 442.4 +/- 0.1 (jitter = 0.4) 7.315
20 images/sec: 442.4 +/- 0.1 (jitter = 0.3) 7.269
30 images/sec: 442.4 +/- 0.0 (jitter = 0.2) 7.271
40 images/sec: 442.4 +/- 0.0 (jitter = 0.2) 7.282
50 images/sec: 442.4 +/- 0.0 (jitter = 0.2) 7.291
60 images/sec: 442.4 +/- 0.0 (jitter = 0.2) 7.250
70 images/sec: 442.4 +/- 0.1 (jitter = 0.2) 7.278
80 images/sec: 442.4 +/- 0.0 (jitter = 0.2) 7.274
90 images/sec: 442.4 +/- 0.0 (jitter = 0.2) 7.286
100 images/sec: 442.4 +/- 0.0 (jitter = 0.2) 7.283
----------------------------------------------------------------
total images/sec: 442.20
----------------------------------------------------------------

 

--use_fp16

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=vgg16 --use_fp16
Step    Img/sec total_loss
1 images/sec: 687.4 +/- 0.0 (jitter = 0.0) 7.279
10 images/sec: 688.2 +/- 0.2 (jitter = 0.5) 7.255
20 images/sec: 688.0 +/- 0.1 (jitter = 0.5) 7.283
30 images/sec: 688.0 +/- 0.1 (jitter = 0.7) 7.254
40 images/sec: 687.9 +/- 0.1 (jitter = 0.7) 7.283
50 images/sec: 687.8 +/- 0.1 (jitter = 0.7) 7.249
60 images/sec: 687.7 +/- 0.1 (jitter = 0.8) 7.294
70 images/sec: 687.6 +/- 0.1 (jitter = 0.9) 7.278
80 images/sec: 687.6 +/- 0.1 (jitter = 0.9) 7.268
90 images/sec: 687.7 +/- 0.1 (jitter = 0.9) 7.264
100 images/sec: 687.6 +/- 0.1 (jitter = 0.9) 7.268
----------------------------------------------------------------
total images/sec: 687.07
----------------------------------------------------------------

 

GoogLeNet BS128

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=128 --model=googlenet
Step    Img/sec total_loss
1 images/sec: 1577.4 +/- 0.0 (jitter = 0.0) 7.104
10 images/sec: 1565.9 +/- 4.1 (jitter = 12.5) 7.105
20 images/sec: 1561.7 +/- 3.1 (jitter = 20.4) 7.094
30 images/sec: 1562.3 +/- 2.5 (jitter = 15.1) 7.087
40 images/sec: 1561.5 +/- 2.2 (jitter = 16.1) 7.067
50 images/sec: 1561.6 +/- 2.0 (jitter = 15.6) 7.091
60 images/sec: 1561.5 +/- 1.8 (jitter = 15.7) 7.049
70 images/sec: 1560.3 +/- 1.9 (jitter = 15.3) 7.074
80 images/sec: 1558.8 +/- 1.9 (jitter = 17.2) 7.077
90 images/sec: 1558.2 +/- 1.8 (jitter = 17.2) 7.079
100 images/sec: 1557.5 +/- 1.8 (jitter = 17.6) 7.066
----------------------------------------------------------------
total images/sec: 1556.06
----------------------------------------------------------------

 

--use_fp16

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=128 --model=googlenet --use_fp16
Step    Img/sec total_loss
1 images/sec: 2690.1 +/- 0.0 (jitter = 0.0) 7.173
10 images/sec: 2675.3 +/- 13.9 (jitter = 35.5) 7.068
20 images/sec: 2682.4 +/- 9.9 (jitter = 55.4) 7.086
30 images/sec: 2686.6 +/- 8.3 (jitter = 36.6) 7.075
40 images/sec: 2687.8 +/- 6.9 (jitter = 30.6) 7.084
50 images/sec: 2686.7 +/- 6.0 (jitter = 36.4) 7.076
60 images/sec: 2687.5 +/- 5.4 (jitter = 36.4) 7.075
70 images/sec: 2681.0 +/- 6.8 (jitter = 41.6) 7.075
80 images/sec: 2683.2 +/- 6.1 (jitter = 34.0) 7.065
90 images/sec: 2684.1 +/- 5.6 (jitter = 35.6) 7.092
100 images/sec: 2683.9 +/- 5.2 (jitter = 36.1) 7.052
----------------------------------------------------------------
total images/sec: 2680.27
----------------------------------------------------------------

 

ResNet152 BS32

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=32 --model=resnet152
Step    Img/sec total_loss
1 images/sec: 225.6 +/- 0.0 (jitter = 0.0) 9.060
10 images/sec: 228.3 +/- 1.0 (jitter = 2.0) 8.594
20 images/sec: 228.3 +/- 0.6 (jitter = 2.0) 8.635
30 images/sec: 228.2 +/- 0.5 (jitter = 2.5) 8.719
40 images/sec: 227.9 +/- 0.5 (jitter = 2.8) 8.599
50 images/sec: 228.1 +/- 0.5 (jitter = 2.9) 8.791
60 images/sec: 228.3 +/- 0.4 (jitter = 3.6) 8.668
70 images/sec: 228.3 +/- 0.4 (jitter = 3.3) 9.072
80 images/sec: 228.3 +/- 0.4 (jitter = 3.5) 8.874
90 images/sec: 228.4 +/- 0.3 (jitter = 3.7) 9.030
100 images/sec: 228.4 +/- 0.3 (jitter = 3.7) 8.839
----------------------------------------------------------------
total images/sec: 228.29
----------------------------------------------------------------

 

--use_fp16

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=32 --model=resnet152 --use_fp16
Step    Img/sec total_loss
1 images/sec: 392.9 +/- 0.0 (jitter = 0.0) 9.147
10 images/sec: 397.9 +/- 2.8 (jitter = 6.0) 9.000
20 images/sec: 399.0 +/- 2.1 (jitter = 8.6) 8.842
30 images/sec: 393.7 +/- 2.9 (jitter = 14.7) 8.813
40 images/sec: 394.4 +/- 2.3 (jitter = 15.2) 8.984
50 images/sec: 394.9 +/- 2.0 (jitter = 13.9) 8.647
60 images/sec: 395.7 +/- 1.8 (jitter = 13.9) 8.838
70 images/sec: 396.5 +/- 1.6 (jitter = 15.3) 8.941
80 images/sec: 395.9 +/- 1.4 (jitter = 13.4) 8.913
90 images/sec: 396.2 +/- 1.3 (jitter = 14.1) 8.807
100 images/sec: 395.7 +/- 1.3 (jitter = 14.5) 8.729
----------------------------------------------------------------
total images/sec: 395.34
----------------------------------------------------------------

 

性能对比

A100 和V100 和 2080ti 性能对比:

 

https://www.tonyisstark.com/383.html

滴滴云A100 40G+TensorFlow1.15.2 +Ubuntu 18.04 性能测试

上一篇:eclipse中tomcat的add and remove找不到项目


下一篇:Linux系统调用详解(实现机制分析)--linux内核剖析(六)【转】