wrk压测TF-serving

serving服务

# 启动镜像
docker run -t --rm -p 8501:8501 \
-v "${PATH}/serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_two_cpu:/models/half_plus_two" \
-e MODEL_NAME=half_plus_two \
tensorflow/serving &

# 验证
curl -d '{"instances": [1.2, 2.0, 5.0]}' \
-X POST http://localhost:8501/v1/models/half_plus_two:predict

安装

mac: 
	brew install wrk
linux: 
	git clone https://github.com/wg/wrk.git 
	make

编辑test.lua

wrk.method = "POST"
wrk.headers["Content-Type"] = "application/json"
wrk.body = '{"instances": [1.2, 2.0, 5.0]}'

压测

wrk -t8 -c200 -d20s --script=test.lua --latency http://localhost:8501/v1/models/half_plus_two:predict

# 结果
Running 20s test @ http://localhost:8501/v1/models/half_plus_two:predict
  8 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    49.89ms   31.43ms 322.27ms   94.01%
    Req/Sec   550.19    145.19   790.00     70.58%
  Latency Distribution
     50%   41.94ms
     75%   49.79ms
     90%   64.09ms
     99%  215.99ms
  86347 requests in 20.09s, 15.48MB read
  Non-2xx or 3xx responses: 86347
Requests/sec:   4297.20
Transfer/sec:    788.94KB
上一篇:tensorflow/serving部署keras模型


下一篇:Deploy a trained model