serving服务
# 启动镜像
docker run -t --rm -p 8501:8501 \
-v "${PATH}/serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_two_cpu:/models/half_plus_two" \
-e MODEL_NAME=half_plus_two \
tensorflow/serving &
# 验证
curl -d '{"instances": [1.2, 2.0, 5.0]}' \
-X POST http://localhost:8501/v1/models/half_plus_two:predict
安装
mac:
brew install wrk
linux:
git clone https://github.com/wg/wrk.git
make
编辑test.lua
wrk.method = "POST"
wrk.headers["Content-Type"] = "application/json"
wrk.body = '{"instances": [1.2, 2.0, 5.0]}'
压测
wrk -t8 -c200 -d20s --script=test.lua --latency http://localhost:8501/v1/models/half_plus_two:predict
# 结果
Running 20s test @ http://localhost:8501/v1/models/half_plus_two:predict
8 threads and 200 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 49.89ms 31.43ms 322.27ms 94.01%
Req/Sec 550.19 145.19 790.00 70.58%
Latency Distribution
50% 41.94ms
75% 49.79ms
90% 64.09ms
99% 215.99ms
86347 requests in 20.09s, 15.48MB read
Non-2xx or 3xx responses: 86347
Requests/sec: 4297.20
Transfer/sec: 788.94KB