报错描述
项目在本地运行时,没发现异常,待部署服务器之后,检查日志文件有多个redis超时报错如下:
顺藤摸瓜可见,在调用redis工具类的scanRedis方法后,间接调用了hscan方法,出现socket读超时,先来看一下redis工具类的代码:
public class RedisUtils {
private JedisPool pool;
private String spiderUUID;
private String coordinatesHashKey;
private ScanParams scanParams = new ScanParams().match("*").count(100);
private static Logger logger = Logger.getLogger(RedisUtils.class);
public RedisUtils(String host, int port, String password, String spiderUUID) {
this(new JedisPool(new JedisPoolConfig(), host, port, 2000, password));
this.spiderUUID = spiderUUID;
coordinatesHashKey = this.spiderUUID + ":coordinates";
}
public RedisUtils(JedisPool pool) {
this.pool = pool;
}
public void resetDownloadSuccessFlag(){
Jedis jedis = null;
try {
jedis = this.pool.getResource();
Map<String, String> result = jedis.hgetAll(coordinatesHashKey);
Set<String> keySet = result.keySet();
Iterator iterator = keySet.iterator();
while (iterator.hasNext()){
String key = (String) iterator.next();
String value = jedis.get(this.spiderUUID + ":" + key);
if (!"1".equals(value)){
jedis.set(this.spiderUUID + ":" + key, String.valueOf(1));
}
}
}finally {
if (jedis!=null){
jedis.close();
}
}
}
public Boolean isPanoIDExists(String panoId) {
Jedis jedis = null;
Boolean exists;
try {
jedis = this.pool.getResource();
exists = jedis.exists(this.spiderUUID + ":" + panoId);
return exists;
}finally {
if (jedis!=null){
jedis.close();
}
}
}
public boolean removeKeys(){
Jedis jedis = this.pool.getResource();
try {
Set<String> keys = jedis.keys(this.spiderUUID + ":*" );
if(keys != null && !keys.isEmpty()) {
logger.info("redis has stored " + keys.size() + " keys, now ready to remove them all!");
String[] array = new String[keys.size()];
jedis.del(keys.toArray(array));
}
return true;
}catch (Exception e){
e.printStackTrace();
}finally {
if (jedis!=null){
jedis.close();
}
}
return true;
}
public boolean panoIdPush(String panoId) {
Jedis jedis = this.pool.getResource();
try {
long num = jedis.setnx(this.spiderUUID + ":" + panoId, String.valueOf(1));
return num==1;
} finally {
if (jedis!=null){
jedis.close();
}
}
}
public boolean coordinatesPush(Coordinates coordinates) {
Jedis jedis = this.pool.getResource();
String panoId = coordinates.getPanoId();
try {
jedis.setnx(this.spiderUUID + ":" + panoId, String.valueOf(1));
return 1==jedis.hsetnx(coordinatesHashKey, panoId, JSON.toJSONString(coordinates));
} finally {
if (jedis!=null){
jedis.close();
}
}
}
public boolean particularCoordinatesPush(ParticularCoordinates coordinates) {
Jedis jedis = this.pool.getResource();
String panoId = coordinates.getPanoId();
try {
jedis.setnx(this.spiderUUID + ":" + panoId, String.valueOf(1));
long num = jedis.hsetnx(coordinatesHashKey, panoId, JSON.toJSONString(coordinates));
return num==1;
} finally {
if (jedis!=null){
jedis.close();
}
}
}
public ScanResult<Map.Entry<String, String>> scanRedis(int cursor){
ScanResult<Map.Entry<String, String>> scanResults = null;
Jedis jedis = this.pool.getResource();
long timeBefore = System.currentTimeMillis();
try{
scanResults = jedis.hscan(coordinatesHashKey, String.valueOf(cursor), scanParams);
}catch (Exception e){
e.printStackTrace();
}finally {
if (jedis != null) {
jedis.close();
}
}
return scanResults;
}
public boolean downloadSucceed(String panoId) {
Jedis jedis = this.pool.getResource();
long timeBefore = System.currentTimeMillis();
int num=0;
try {
jedis.incr(this.spiderUUID + ":" + panoId);
num = Integer.parseInt(jedis.get(this.spiderUUID + ":" + panoId));
}
catch (JedisConnectionException e1){
e1.printStackTrace();
}finally {
if (jedis!=null){
jedis.close();
}
}
return num == 2;
}
}
分析
首先回顾redisPool生成客户端的流程逻辑图
上述代码构造方法中将jedisPool的超时时间设置了2秒,检查源码发现默认就是2秒。一番百度谷歌之后发现,问题原因可能有如下几种:(1) 读写超时设置的过短。(2) 有慢查询或者Redis发生阻塞。(3) 网络不稳定。而网友的解决之道无非是:1、提高超时时间(将jedispool超时时间设为20秒、甚至100秒);2、分析redis执行命令,避免执行过于耗时的运算;3、使用jedis新版本,最好2.9.0版本以上,并在使用完毕后采用jedis.close()方法解决归还已破坏的连接问题。
但本项目中jedis正好是2.9.0版本,笔者也将hcsan遍历数量100条降到了每次返回50条,在测试过程中还简化了业务流程,一个线程只进行一次Redis操作(scan命令),且操作过程不存在锁竞争,同时也把超时时间设大,一度设置到100秒仍然出现上述异常。
redis基于内存操作,同时又是单线程运算,一般的命令执行都是毫秒、微妙级,通过查看运维工具Grafana显示hscan的平均执行时间也是47us左右,且服务器端没有拒绝连接的情况,redis运维监测图分别如下所示:
因此,设置超时等待100秒之后还是出现以上问题不免大大的疑惑,心想公司网络不可能断网超过100秒吧。
继续Google发现,这个异常是客户端缓冲区异常,产生这个问题可能有以下三个原因:
(1) 常见原因:多个线程使用一个Jedis连接。正常的情况是一个线程使用一个Jedis连接,可以使用JedisPool管理Jedis连接,实现线程安全,避免出现这种情况。例如下面代码就是两个线程共用了一个Jedis连接:
new Thread(new Runnable() {
public void run() {
for (int i = 0; i < 100; i++) {
jedis.get("hello");
}
}
}).start();
new Thread(new Runnable() {
public void run() {
for (int i = 0; i < 100; i++) {
jedis.hget("haskey", "f");
}
}
}).start();
(2) 客户端缓冲区满了
Redis有三种客户端缓冲区:
1. 普通客户端缓冲区(normal):用于接受普通的命令,例如get、set、mset、hgetall、zrange等。
2. slave客户端缓冲区(slave):用于同步master节点的写命令,完成复制。
3. 发布订阅缓冲区(pubsub):pubsub不是普通的命令,因此有单独的缓冲区。
(3) 长时间闲置连接会被服务端主动断开,可以查询timeout配置的设置以及自身连接池配置确定是否需要做空闲检测。
解决
此时看到这里依然感觉是云里雾里,但又看到了 这篇文章 和自己报错一样,文中指出底层的socket超时是由网络异常造成,解决方法是:重置连接状态,重置计数器。让我想到了以前netty框架中有个ByteBuf也是解决了buffer游标重置的问题。
文中介绍了以下两个方法:
jedis.disconnect(); //直接断开当前连接,将连接从连接池中踢出去。
jedis.resetPipelinedCount(); // 直接重置计数器
好了,在工具类方法中客户端归还时加上了jedis.disconnect()方法后,部署运行发现,redis.clients.util.RedisInputStream.ensureFill异常没有了,但随之而来的又是一个新异常:redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
异常出现在JedisPool中,该异常明眼一看就是连接数量不够,等待超时后报错产生。那就先看看连接池JedisPool源码,可以看到JedisPool的构造方法中有个JedisPoolConfig配置器,该配置器继承了apache.commons.pool包中的GenericObjectPoolConfig对象:
public static final int DEFAULT_MAX_TOTAL = 8;
public static final int DEFAULT_MAX_IDLE = 8;
public static final int DEFAULT_MIN_IDLE = 0;
private int maxTotal = 8;
private int maxIdle = 8;
private int minIdle = 0;
源码一看便知,最大连接数量也不过是8,而本项目核心线程数量设置为10,肯定不够。故扩大连接数量和空闲数量即可。
再观察一下GenericObjectPoolConfig类,该类继承自抽象类BaseObjectPoolConfig,继续点进BaseObjectPoolConfig去可以看到有如下源码:
//显示部分源码
private boolean testOnCreate = false;
private boolean testOnBorrow = false;
private boolean testOnReturn = false;
private boolean testWhileIdle = false;
public boolean getTestOnCreate() {
return this.testOnCreate;
}
public void setTestOnCreate(boolean testOnCreate) {
this.testOnCreate = testOnCreate;
}
public boolean getTestOnBorrow() {
return this.testOnBorrow;
}
public void setTestOnBorrow(boolean testOnBorrow) {
this.testOnBorrow = testOnBorrow;
}
public boolean getTestOnReturn() {
return this.testOnReturn;
}
public void setTestOnReturn(boolean testOnReturn) {
this.testOnReturn = testOnReturn;
}
public boolean getTestWhileIdle() {
return this.testWhileIdle;
}
public void setTestWhileIdle(boolean testWhileIdle) {
this.testWhileIdle = testWhileIdle;
}
百度了一下,这正是检测网络是否异常的方法,且构造连接池时默认为空,若设为true,则每次向jedisPool连接池中借还客户端之前会对网络进行ping检测。该方法与 jedis.disconnect()具有相似作用,但执行更优雅。因此,尝试在redis工具类中加以设置看看能否保证连接可靠。
改进后的代码:
public class RedisUtils {
private JedisPool pool;
private String spiderUUID;
private String coordinatesHashKey;
private static JedisPoolConfig jedisPoolConfig;
private static final int POOL_MAX_IDLE = 16;
private static final int POOL_MAX_TOTAL = 36;
private static final long POOL_MAX_WAIT = 50000;
private ScanParams scanParams = new ScanParams().match("*").count(50);
private static Logger logger = Logger.getLogger(RedisUtils.class);
static {
jedisPoolConfig = new JedisPoolConfig();
jedisPoolConfig.setTestOnBorrow(true);
jedisPoolConfig.setTestOnReturn(true);
jedisPoolConfig.setMaxIdle(POOL_MAX_IDLE);
jedisPoolConfig.setMaxTotal(POOL_MAX_TOTAL);
jedisPoolConfig.setMaxWaitMillis(POOL_MAX_WAIT);
}
public RedisUtils(String host, int port, String password, String spiderUUID) {
this(new JedisPool(jedisPoolConfig, host, port, 30000, password));
this.spiderUUID = spiderUUID;
coordinatesHashKey = this.spiderUUID + ":coordinates";
}
public RedisUtils(JedisPool pool) {
this.pool = pool;
}
public void resetDownloadSuccessFlag(){
Jedis jedis = null;
try {
jedis = this.pool.getResource();
Map<String, String> result = jedis.hgetAll(coordinatesHashKey);
Set<String> keySet = result.keySet();
Iterator iterator = keySet.iterator();
while (iterator.hasNext()){
String key = (String) iterator.next();
String value = jedis.get(this.spiderUUID + ":" + key);
if (!"1".equals(value)){
jedis.set(this.spiderUUID + ":" + key, String.valueOf(1));
}
}
}finally {
if (jedis!=null){
jedis.getClient().resetPipelinedCount();
jedis.close();
}
}
}
public Boolean isPanoIDExists(String panoId) {
Jedis jedis = null;
Boolean exists;
try {
jedis = this.pool.getResource();
exists = jedis.exists(this.spiderUUID + ":" + panoId);
return exists;
}finally {
if (jedis!=null){
jedis.close();
}
}
}
public boolean removeKeys(){
Jedis jedis = this.pool.getResource();
try {
Set<String> keys = jedis.keys(this.spiderUUID + ":*" );
if(keys != null && !keys.isEmpty()) {
logger.info("redis has stored " + keys.size() + " keys, now ready to remove them all!");
String[] array = new String[keys.size()];
jedis.del(keys.toArray(array));
}
return true;
}catch (Exception e){
e.printStackTrace();
}finally {
if (jedis!=null){
jedis.close();
}
}
return true;
}
public boolean coordinatesPush(Coordinates coordinates) {
Jedis jedis = this.pool.getResource();
String panoId = coordinates.getPanoId();
try {
jedis.setnx(this.spiderUUID + ":" + panoId, String.valueOf(1));
return 1==jedis.hsetnx(coordinatesHashKey, panoId, JSON.toJSONString(coordinates));
} finally {
if (jedis!=null){
jedis.close();
}
}
}
public boolean particularCoordinatesPush(ParticularCoordinates coordinates) {
Jedis jedis = this.pool.getResource();
String panoId = coordinates.getPanoId();
try {
jedis.setnx(this.spiderUUID + ":" + panoId, String.valueOf(1));
long num = jedis.hsetnx(coordinatesHashKey, panoId, JSON.toJSONString(coordinates));
return num==1;
} finally {
if (jedis!=null){
jedis.close();
}
}
}
public ScanResult<Map.Entry<String, String>> scanRedis(int cursor){
ScanResult<Map.Entry<String, String>> scanResults = null;
Jedis jedis = this.pool.getResource();
long timeBefore = System.currentTimeMillis();
try{
scanResults = jedis.hscan(coordinatesHashKey, String.valueOf(cursor), scanParams);
}catch (Exception e){
long timeAfter = System.currentTimeMillis();
long executeTime = timeAfter-timeBefore;
e.printStackTrace();
logger.info("*****************************************************************");
logger.info(Thread.currentThread().getName()+" execute hscan operation time cost : "+executeTime+"ms and current cursor is " + cursor);
logger.info("*****************************************************************");
}finally {
if (jedis != null) {
jedis.close();
}
}
return scanResults;
}
public boolean downloadSucceed(String panoId) {
Jedis jedis = this.pool.getResource();
long timeBefore = System.currentTimeMillis();
int num=0;
try {
jedis.incr(this.spiderUUID + ":" + panoId);
num = Integer.parseInt(jedis.get(this.spiderUUID + ":" + panoId));
}
catch (JedisConnectionException e1){
long timeAfter = System.currentTimeMillis();
long executeTime = timeAfter-timeBefore;
e1.printStackTrace();
logger.info("********************************************************************");
logger.info(Thread.currentThread().getName()+" execute incr and get operation time cost : "+executeTime+"ms!");
logger.info("********************************************************************");
}finally {
if (jedis!=null){
jedis.close();
}
}
return num == 2;
}
}
再次部署运行,无异常抛出,问题得到解决!
参考
https://tech.antfin.com/docs/2/71967
https://www.jishuwen.com/d/2VHU/zh-tw#tuit
https://blog.zongwei.li/2016/10/28/resolve-jedis-read-time-exception/