阿里云容器服务网络性能测试

背景

我们正在将测试环境逐步从mesos框架迁移到阿里云的容器服务,在此过程中测试了四种不同的服务之间相互访问的模式的网络性能。本文阐述了该性能测试的方法,数据和结论。

测试目标

服务容器之间的交互存在四种不同的访问方式:

docker link

docker提供的link方式,可以在一个容器中访问另一个容器。 测试host为app-link。

hostname

在编排服务时,为每一个服务指定一个hostname,在其他服务中可以使用hostname访问对应的服务。测试host为app。

服务发现

容器服务基于HAProxy实现的一套服务间HTTP访问和负载均衡的机制。测试host为app.local。

SLB

传统的负载均衡服务,有HTTP和TCP两种监听模式。测试中HTTP SLB host为192.168.1.10,TCP SLB host为192.168.1.40。

分别通过这四种网络访问目标机器上的http服务接口/ok并测试延迟(latency)和吞吐量(throughput)。

测试工具

  • httping:测试延迟
  • ab(apache benchmarking tool):测试吞吐量

ubuntu 下安装:

  apt-get install httping
  apt-get install apache2-utils

延迟测试:

  1. link:
  httping -c100 -i 0.01 -g 'http://app-link/ok'
  ...
  connected to 172.18.1.3:80 (121 bytes), seq=96 time=0.45 ms
  connected to 172.18.1.3:80 (121 bytes), seq=97 time=0.46 ms
  connected to 172.18.1.3:80 (121 bytes), seq=98 time=0.42 ms
  connected to 172.18.1.3:80 (121 bytes), seq=99 time=1.63 ms
  --- http://app-link/ok ping statistics ---
  100 connects, 100 ok, 0.00% failed, time 1050ms
  round-trip min/avg/max = 0.4/0.5/1.6 ms
  1. hostname
  httping -c100 -i 0.01 -g 'http://app/ok'
  ...
  connected to 172.18.1.3:80 (121 bytes), seq=96 time=10.80 ms
  connected to 172.18.1.3:80 (121 bytes), seq=97 time=0.43 ms
  connected to 172.18.1.3:80 (121 bytes), seq=98 time=0.44 ms
  connected to 172.18.1.3:80 (121 bytes), seq=99 time=0.46 ms
  --- http://app/ok ping statistics ---
  100 connects, 100 ok, 0.00% failed, time 1073ms
  round-trip min/avg/max = 0.4/0.7/10.8 ms
  1. 服务发现
  httping -c100 -i 0.01 -g 'http://app.local/ok'
  ...
  connected to 172.18.1.2:80 (219 bytes), seq=96 time=0.69 ms
  connected to 172.18.1.2:80 (219 bytes), seq=97 time=0.67 ms
  connected to 172.18.1.2:80 (219 bytes), seq=98 time=0.74 ms
  connected to 172.18.1.2:80 (219 bytes), seq=99 time=0.65 ms
  --- http://app.local/ok ping statistics ---
  100 connects, 100 ok, 0.00% failed, time 1090ms
  round-trip min/avg/max = 0.6/0.9/6.0 ms
  1. HTTP SLB
  httping -c100 -i 0.01 -g 'http://192.168.1.10/ok'
  ...
  connected to 192.168.1.10:80 (140 bytes), seq=96 time=1.19 ms
  connected to 192.168.1.10:80 (140 bytes), seq=97 time=1.08 ms
  connected to 192.168.1.10:80 (140 bytes), seq=98 time=1.15 ms
  connected to 192.168.1.10:80 (140 bytes), seq=99 time=1.30 ms
  --- http://192.168.1.10/ok ping statistics ---
  100 connects, 100 ok, 0.00% failed, time 1123ms
  round-trip min/avg/max = 1.0/1.2/2.9 ms
  1. TCP SLB
  httping -c100 -i 0.01 -g 'http://192.168.1.40/ok'
  ...
  connected to 192.168.1.40:80 (121 bytes), seq=96 time=1.18 ms
  connected to 192.168.1.40:80 (121 bytes), seq=97 time=1.25 ms
  connected to 192.168.1.40:80 (121 bytes), seq=98 time=1.06 ms
  connected to 192.168.1.40:80 (121 bytes), seq=99 time=1.34 ms
  --- http://192.168.1.40/ok ping statistics ---
  100 connects, 100 ok, 0.00% failed, time 1137ms
  round-trip min/avg/max = 1.0/1.3/2.7 ms

测试结果

测试了100次HEAD请求,平均时延如下表:

访问方式 延时(ms)
docker link 0.5
hostname 0.7
服务发现 0.9
HTTP SLB 1.2
TCP SLB 1.3

吞吐量测试:

  1. link
  ab -lkc 10000 -n 10000 'http://app-link/ok'
  Concurrency Level:      10000
  Time taken for tests:   0.864 seconds
  Complete requests:      10000
  Failed requests:        0
  Keep-Alive requests:    10000
  Total transferred:      2020000 bytes
  HTML transferred:       610000 bytes
  Requests per second:    11571.74 [#/sec](mean)
  Time per request:       864.174 [ms](mean)
  Time per request:       0.086 [ms](mean, across all concurrent requests)
  Transfer rate:          2282.71 [Kbytes/sec] received
  1. hostname
  ab -lkc 10000 -n 10000 'http://app/ok'
  Concurrency Level:      10000
  Time taken for tests:   1.055 seconds
  Complete requests:      10000
  Failed requests:        0
  Keep-Alive requests:    10000
  Total transferred:      2020000 bytes
  HTML transferred:       610000 bytes
  Requests per second:    9476.49 [#/sec](mean)
  Time per request:       1055.243 [ms](mean)
  Time per request:       0.106 [ms](mean, across all concurrent requests)
  Transfer rate:          1869.39 [Kbytes/sec] received
  1. 服务发现
  ab -lkc 10000 -n 10000 'http://app.local/ok'
  Concurrency Level:      10000
  Time taken for tests:   4.276 seconds
  Complete requests:      10000
  Failed requests:        0
  Keep-Alive requests:    10000
  Total transferred:      3000000 bytes
  HTML transferred:       610000 bytes
  Requests per second:    2338.60 [#/sec](mean)
  Time per request:       4276.066 [ms](mean)
  Time per request:       0.428 [ms](mean, across all concurrent requests)
  Transfer rate:          685.14 [Kbytes/sec] received
  1. HTTP SLB
  ab -lkc 10000 -n 10000 'http://192.168.1.10/ok'
  Concurrency Level:      10000
  Time taken for tests:   6.308 seconds
  Complete requests:      10000
  Failed requests:        0
  Non-2xx responses:      580
  Keep-Alive requests:    10000
  Total transferred:      2141800 bytes
  HTML transferred:       732380 bytes
  Requests per second:    1585.41 [#/sec](mean)
  Time per request:       6307.517 [ms](mean)
  Time per request:       0.631 [ms](mean, across all concurrent requests)
  Transfer rate:          331.60 [Kbytes/sec] received

同时10000个并发请求,有580个请求SLB nginx返回了504错误,以下为response。

  <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
  <html>
  <head><title>504 Gateway Time-out</title></head>
  <body bgcolor="white">
  <h1>504 Gateway Time-out</h1>
  <p>The gateway did not receive a timely response from the upstream server or application.</body>
  </html>

  WARNING: Response code not 2xx (504)
  1. TCP SLB
  ab -lkc 10000 -n 10000 'http://192.168.1.40/ok'
  Concurrency Level:      10000
  Time taken for tests:   1.891 seconds
  Complete requests:      10000
  Failed requests:        0
  Keep-Alive requests:    10000
  Total transferred:      2020000 bytes
  HTML transferred:       610000 bytes
  Requests per second:    5287.14 [#/sec](mean)
  Time per request:       1891.383 [ms](mean)
  Time per request:       0.189 [ms](mean, across all concurrent requests)
  Transfer rate:          1042.97 [Kbytes/sec] received

测试结果

并发请求10000次,每秒处理的请求数如下表:

访问方式 吞吐量(RPS)
docker link 11571.74
hostname 9476.49
服务发现 2338.60
HTTP SLB 1585.41
TCP SLB 5287.14

结论:

  1. 从数据上看,使用docker link机制访问服务,无论是延迟和吞吐量都是最好的,hostname方式其次
  2. 同是 HTTP 模式的服务发现和HTTP SLB,性能最差
  3. HTTP SLB的并发性能似乎并不理想,10000个请求有5.8%的请求返回了504 Gateway错误

问题:

虽然docker link和hostname网络性能最佳,但不清楚其负载能力如何。测试中我们发现hostname方式是具有负载能力的,不过在官方帮助文档中,hostname访问方式被放在『不具备负载均衡能力的访问方式』中,而且被描述为『能做到一定的负载均衡的作用』。可见阿里云并没有强调其负载能力。如果在生产环境中使用,负载均衡能力也是相当重要的一个指标。

最后,这两个问题还需要向阿里云进一步确认:

  1. link和hostname模式的负载能力到底如何?
  2. 对于具备负载衡量能力、HTTP模式,内网使用这三个要求,是否有推荐使用的网络模式?
上一篇:【转载】Python的包管理工具(进化关系)


下一篇:【转载】Python包管理工具pip与easy_install