线上环境,偶尔Could not get a resource from the pool
Force-King opened this issue · comments
错误如下:
2019-10-14 at 13:05:26.633 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketTimeoutException: Read timed out
2019-10-14 at 13:05:27.387 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisException: Could not get a resource from the pool
at io.codis.jodis.RoundRobinJedisPool.getResource(RoundRobinJedisPool.java:218) ~[jodis-0.5.1.jar:?]
2019-10-14 at 13:05:27.350 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisException: Could not get a resource from the pool
at io.codis.jodis.RoundRobinJedisPool.getResource(RoundRobinJedisPool.java:218) ~[jodis-0.5.1.jar:?]
2019-10-14 at 13:05:27.304 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisException: Could not get a resource from the pool
at io.codis.jodis.RoundRobinJedisPool.getResource(RoundRobinJedisPool.java:218) ~[jodis-0.5.1.jar:?]
2019-10-14 at 13:05:27.199 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisException: Could not get a resource from the pool
at io.codis.jodis.RoundRobinJedisPool.getResource(RoundRobinJedisPool.java:218) ~[jodis-0.5.1.jar:?]
2019-10-14 at 13:05:27.219 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketTimeoutException: Read timed out
2019-10-14 at 13:05:27.161 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
at io.codis.jodis.RoundRobinJedisPool.getResource(RoundRobinJedisPool.java:218) ~[jodis-0.5.1.jar:?]
2019-10-14 at 13:05:27.092 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisException: Could not get a resource from the pool
at io.codis.jodis.RoundRobinJedisPool.getResource(RoundRobinJedisPool.java:218) ~[jodis-0.5.1.jar:?]
2019-10-14 at 13:05:27.071 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
at io.codis.jodis.RoundRobinJedisPool.getResource(RoundRobinJedisPool.java:218) ~[jodis-0.5.1.jar:?]
补充:线上环境,运行一段时间后 报 超时, 观察报错节点, 有大量 swap 操作,后关闭了 swap, 报错没了。
运行了一段时间,现在又偶尔报 以上错误,无法获取连接。查找 codis 和 代理 zk 的日志,均微发现异常log.
请问哪位大神帮解答一下?
codis 客户端连接代码:
@Bean
public JedisResourcePool getPool() {
JedisPoolConfig poolConfig = new JedisPoolConfig();
poolConfig.setMaxIdle(max_idle);
poolConfig.setMaxTotal(max_active);
poolConfig.setTestOnBorrow(true);
poolConfig.setTestOnReturn(true);
poolConfig.setMaxWaitMillis(max_wait);
poolConfig.setBlockWhenExhausted(false);
JedisResourcePool pool = RoundRobinJedisPool.create().poolConfig(poolConfig)
.curatorClient(zkAddr, timeout).zkProxyDir(zkProxyDir).build();
return pool;
}
codis 操作类:
@Autowired
private JedisResourcePool jedisPool;
/**
* 获取缓存
*
* @param key
* @return
*/
public String get(String key) {
try (Jedis jedis = jedisPool.getResource()) {
return jedis.get(key);
} catch (Exception e) {
logger.error("codis get exception, key ={}. Exception:", key, e);
return null;
}
}
我也遇到这个情况,
看情况是在并发小的情况下没有问题。线上10台设备写codis,流量比较平滑,跑2年了都没问题。
最近上了一个查询接口峰值在4kqps,这个接口隔天必宕,并且无法自动恢复。
接口日志报Could not get a resource from the pool
但是从TCP查看连接数,远远没有到配置的最大连接数。
@Force-King 测试了一下,应该跟多线程有关。单线程无限循环跑是没问题的。
多线程跑,结束线程之后,池中连接还是ALLOCATED状态,无法恢复到IDLE。
然后我在getResource方法上包装了synchronized也无法解决~
下一步准备细看下源码实现
@Force-King 昨晚跟了下代码,发现是jedis的bug;并且新版jedis已经修复。指定最新版jedis依赖就能解决了哈。
@Apache9 可以关闭这个issue了
附上测试代码
@test
public void poolTest() throws InterruptedException {
RedisFactory factory=new RedisFactory();
CountDownLatch latch=new CountDownLatch(5);//count=5>thread=4;让主线程无限等待,方便测试
AtomicLong curr=new AtomicLong(0);//用来记录获取-释放连接的速度
AtomicLong prev=new AtomicLong(0);
for(int i=0;i<4;i++) {//启动4个线程无限循环获取连接,让问题暴露出来
new Thread() {
@OverRide
public void run() {
try {
while (true){
try(Jedis jedis = factory.getRedisClient()) {
curr.incrementAndGet();
}
}
} catch (Exception e) {//异常则跳出循环,结束线程
System.err.println("can not get conn, loop out: ");
e.printStackTrace();
}finally {
System.out.println("runner count down");
latch.countDown();
}
}
}.start();
}
new Thread(){//启动1个线程定时获取连接,测试连接池异常后能否自动恢复
@OverRide
public void run() {
while (true){//持续获取连接,异常打印信息
try (Jedis jedis = factory.getRedisClient()) {
Thread.sleep(1000L);
long rate=curr.incrementAndGet()-prev.longValue();
prev.set(curr.longValue());
System.out.println("curr conn: "+jedis+", rate: "+rate);
}catch (Exception e){
System.err.println("can not get conn: "+e.getMessage());
}
}
}
}.start();
latch.await();
System.out.println(factory);
}
jedis-2.9.3.jar 就已经解决这个问题了。