CodisLabs / jodis

A java client for codis based on Jedis and Curator

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

线上环境,偶尔Could not get a resource from the pool

Force-King opened this issue · comments

错误如下:

2019-10-14 at 13:05:26.633 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketTimeoutException: Read timed out
2019-10-14 at 13:05:27.387 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisException: Could not get a resource from the pool
	at io.codis.jodis.RoundRobinJedisPool.getResource(RoundRobinJedisPool.java:218) ~[jodis-0.5.1.jar:?]
2019-10-14 at 13:05:27.350 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisException: Could not get a resource from the pool
	at io.codis.jodis.RoundRobinJedisPool.getResource(RoundRobinJedisPool.java:218) ~[jodis-0.5.1.jar:?]
2019-10-14 at 13:05:27.304 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisException: Could not get a resource from the pool
	at io.codis.jodis.RoundRobinJedisPool.getResource(RoundRobinJedisPool.java:218) ~[jodis-0.5.1.jar:?]
2019-10-14 at 13:05:27.199 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisException: Could not get a resource from the pool
	at io.codis.jodis.RoundRobinJedisPool.getResource(RoundRobinJedisPool.java:218) ~[jodis-0.5.1.jar:?]
2019-10-14 at 13:05:27.219 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketTimeoutException: Read timed out
2019-10-14 at 13:05:27.161 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
	at io.codis.jodis.RoundRobinJedisPool.getResource(RoundRobinJedisPool.java:218) ~[jodis-0.5.1.jar:?]
2019-10-14 at 13:05:27.092 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisException: Could not get a resource from the pool
	at io.codis.jodis.RoundRobinJedisPool.getResource(RoundRobinJedisPool.java:218) ~[jodis-0.5.1.jar:?]
2019-10-14 at 13:05:27.071 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
	at io.codis.jodis.RoundRobinJedisPool.getResource(RoundRobinJedisPool.java:218) ~[jodis-0.5.1.jar:?]

补充:线上环境,运行一段时间后 报 超时, 观察报错节点, 有大量 swap 操作,后关闭了 swap, 报错没了。
运行了一段时间,现在又偶尔报 以上错误,无法获取连接。查找 codis 和 代理 zk 的日志,均微发现异常log.

请问哪位大神帮解答一下?

codis 客户端连接代码:

@Bean
public JedisResourcePool getPool() {
        JedisPoolConfig poolConfig = new JedisPoolConfig();
        poolConfig.setMaxIdle(max_idle);
        poolConfig.setMaxTotal(max_active);
        poolConfig.setTestOnBorrow(true);
        poolConfig.setTestOnReturn(true);
        poolConfig.setMaxWaitMillis(max_wait);
        poolConfig.setBlockWhenExhausted(false);

        JedisResourcePool pool = RoundRobinJedisPool.create().poolConfig(poolConfig)
                .curatorClient(zkAddr, timeout).zkProxyDir(zkProxyDir).build();
        return pool;
    }

codis 操作类:

@Autowired
private JedisResourcePool jedisPool;

    /**
     * 获取缓存
     *
     * @param key
     * @return
     */
    public String get(String key) {
        try (Jedis jedis = jedisPool.getResource()) {
            return jedis.get(key);
        } catch (Exception e) {
            logger.error("codis get exception, key ={}. Exception:", key, e);
            return null;
        }
    }

我也遇到这个情况,
看情况是在并发小的情况下没有问题。线上10台设备写codis,流量比较平滑,跑2年了都没问题。
最近上了一个查询接口峰值在4kqps,这个接口隔天必宕,并且无法自动恢复。
接口日志报Could not get a resource from the pool
但是从TCP查看连接数,远远没有到配置的最大连接数。

@etansens 你找到问题原因了吗? 加机器是否能解决这个问题? 目前我们是 2K QPS, 就报这个错了

@Force-King 测试了一下,应该跟多线程有关。单线程无限循环跑是没问题的。
多线程跑,结束线程之后,池中连接还是ALLOCATED状态,无法恢复到IDLE。
然后我在getResource方法上包装了synchronized也无法解决~
下一步准备细看下源码实现

@Force-King 昨晚跟了下代码,发现是jedis的bug;并且新版jedis已经修复。指定最新版jedis依赖就能解决了哈。
@Apache9 可以关闭这个issue了

附上测试代码

@test
public void poolTest() throws InterruptedException {
RedisFactory factory=new RedisFactory();
CountDownLatch latch=new CountDownLatch(5);//count=5>thread=4;让主线程无限等待,方便测试
AtomicLong curr=new AtomicLong(0);//用来记录获取-释放连接的速度
AtomicLong prev=new AtomicLong(0);
for(int i=0;i<4;i++) {//启动4个线程无限循环获取连接,让问题暴露出来
new Thread() {
@OverRide
public void run() {
try {
while (true){
try(Jedis jedis = factory.getRedisClient()) {
curr.incrementAndGet();
}
}
} catch (Exception e) {//异常则跳出循环,结束线程
System.err.println("can not get conn, loop out: ");
e.printStackTrace();
}finally {
System.out.println("runner count down");
latch.countDown();
}
}
}.start();
}
new Thread(){//启动1个线程定时获取连接,测试连接池异常后能否自动恢复
@OverRide
public void run() {
while (true){//持续获取连接,异常打印信息
try (Jedis jedis = factory.getRedisClient()) {
Thread.sleep(1000L);
long rate=curr.incrementAndGet()-prev.longValue();
prev.set(curr.longValue());
System.out.println("curr conn: "+jedis+", rate: "+rate);
}catch (Exception e){
System.err.println("can not get conn: "+e.getMessage());
}
}
}
}.start();
latch.await();
System.out.println(factory);
}

@etansens 我目前用的jedis 版本是 2.9.0 ,是改为最新版 3.1.0 就没问题了是吗?

jedis-2.9.3.jar 就已经解决这个问题了。