对于Redis服务,通常我们推荐用户使用长连接来访问Redis,由于短连接每次需要新建链接所以短连接在tcp协议层面性能就比长连接低效,但是由于某些用户在连接池失效的时候还是会建立大量的短连接或者用户由于客户端限制还是只能使用短连接来访问Redis,而原生的Redis在频繁建立短连接的时候有一定性能缺陷,我们在云上就碰到用户短连接的性能问题。
1. 问题
通过历史监控我们可以发现用户在频繁使用短连接的时候Redis的cpu使用率有显著的上升
2. 排查
通过扁鹊查看但是Redis的cpu运行情况如下
从扁鹊我们可以看到Redis在freeClient的时候会频繁调用listSearchKey,并且该函数占用了百分30左右的调用量,如果我们可以优化降低该调用,短连接性能将得到具体提升。
3. 优化
通过以上分析我们可以知道Redis在释放链接的时候频繁调用了listSearchKey,通过查看Redis关闭客户端源码如下
void freeClient(redisClient *c) {
listNode *ln;
/* If this is marked as current client unset it */
if (server.current_client == c) server.current_client = NULL;
/* If it is our master that's beging disconnected we should make sure
* to cache the state to try a partial resynchronization later.
*
* Note that before doing this we make sure that the client is not in
* some unexpected state, by checking its flags. */
if (server.master && c->flags & REDIS_MASTER) {
redisLog(REDIS_WARNING,"Connection with master lost.");
if (!(c->flags & (REDIS_CLOSE_AFTER_REPLY|
REDIS_CLOSE_ASAP|
REDIS_BLOCKED| REDIS_UNBLOCKED)))
{
replicationCacheMaster(c);
return;
}
}
/* Log link disconnection with slave */
if ((c->flags & REDIS_SLAVE) && !(c->flags & REDIS_MONITOR)) {
redisLog(REDIS_WARNING,"Connection with slave %s lost.",
replicationGetSlaveName(c));
}
/* Free the query buffer */
sdsfree(c->querybuf);
c->querybuf = NULL;
/* Deallocate structures used to block on blocking ops. */
if (c->flags & REDIS_BLOCKED)
unblockClientWaitingData(c);
dictRelease(c->bpop.keys);
freeClientArgv(c);
/* Remove from the list of clients */
if (c->fd != -1) {
ln = listSearchKey(server.clients,c);
redisAssert(ln != NULL);
listDelNode(server.clients,ln);
}
/* When client was just unblocked because of a blocking operation,
* remove it from the list of unblocked clients. */
if (c->flags & REDIS_UNBLOCKED) {
ln = listSearchKey(server.unblocked_clients,c);
redisAssert(ln != NULL);
listDelNode(server.unblocked_clients,ln);
}
...
...
...
/* Release other dynamically allocated client structure fields,
* and finally release the client structure itself. */
if (c->name) decrRefCount(c->name);
zfree(c->argv);
freeClientMultiState(c);
sdsfree(c->peerid);
if (c->pause_event > 0) aeDeleteTimeEvent(server.el, c->pause_event);
zfree(c);
}
从源码我们可以看到Redis在释放链接的时候遍历server.clients查找到对应的redisClient对象然后调用listDelNode把该redisClient对象从server.clients删除,代码如下:
/* Remove from the list of clients */
if (c->fd != -1) {
ln = listSearchKey(server.clients,c);
redisAssert(ln != NULL);
listDelNode(server.clients,ln);
}
查看server.clients为List结构,而redis定义的List为双端链表,我们可以在createClient的时候将redisClient的指针地址保留再freeClient的时候直接删除对应的listNode即可,无需再次遍历server.clients,代码优化如下:
3.1 createClient修改
redisClient *createClient(int fd) {
redisClient *c = zmalloc(sizeof(redisClient));
/* passing -1 as fd it is possible to create a non connected client.
* This is useful since all the Redis commands needs to be executed
* in the context of a client. When commands are executed in other
* contexts (for instance a Lua script) we need a non connected client. */
if (fd != -1) {
anetNonBlock(NULL,fd);
anetEnableTcpNoDelay(NULL,fd);
if (server.tcpkeepalive)
anetKeepAlive(NULL,fd,server.tcpkeepalive);
if (aeCreateFileEvent(server.el,fd,AE_READABLE,
readQueryFromClient, c) == AE_ERR)
{
close(fd);
zfree(c);
return NULL;
}
}
...
if (fd != -1) {
c->client_list_node = listAddNodeTailReturnNode(server.clients,c);
}
return c;
}
3.2 freeClient修改
freeClient修改如下:
/* Remove from the list of clients */
if (c->fd != -1) {
if (c->client_list_node != NULL) listDelNode(server.clients,c->client_list_node);
}
3.3 优化结果
在同一台物理机上启动优化前后的Redis,分别进行压测,压测命令如下
redis-benchmark -h host -p port -k 0 -t get -n 100000 -c 8000
其中-k 代表使用短连接进行测试
原生Redis-server结果:
99.74% <= 963 milliseconds
99.78% <= 964 milliseconds
99.84% <= 965 milliseconds
99.90% <= 966 milliseconds
99.92% <= 967 milliseconds
99.94% <= 968 milliseconds
99.99% <= 969 milliseconds
100.00% <= 969 milliseconds
10065.42 requests per second
优化后Redis-server结果
99.69% <= 422 milliseconds
99.72% <= 423 milliseconds
99.80% <= 424 milliseconds
99.82% <= 425 milliseconds
99.86% <= 426 milliseconds
99.89% <= 427 milliseconds
99.94% <= 428 milliseconds
99.96% <= 429 milliseconds
99.97% <= 430 milliseconds
100.00% <= 431 milliseconds
13823.61 requests per second
我们可以看到优化之后的Redis-server性能在短连接多的场景下提升了百分30%以上。
4. 结束
云数据库Redis版(ApsaraDB for Redis)是一种稳定可靠、性能卓越、可弹性伸缩的数据库服务。基于飞天分布式系统和全SSD盘高性能存储,支持主备版和集群版两套高可用架构。提供了全套的容灾切换、故障迁移、在线扩容、性能优化的数据库解决方案。欢迎各位购买使用:云数据库 Redis 版