wg / lettuce

Scalable Java Redis client

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ClientTest hang with redis 2.4.4

hejin opened this issue · comments

commented

The 'mvn test' hang while running com.lambdaworks.redis.ClientTest.
x86-64 CentOS 5.6 with JDK 1.6.0_29.

any clues?

Hi,

I'm afraid I can't reproduce that locally. Would you please try the following command and let me know which tests fail?

mvn -Dsurefire.timeout=2 -Dtest=ClientTest test

Thanks!

commented

WIth the options above, mvn will abort quickly with the java thread hangs, and no report left in surefire-reports directory.
Directly run the test case w/o timeout setting, from the redis server log, I saw the following msgs repeatedly:

[12387] 15 Dec 12:19:07 - 2 clients connected (0 slaves), 734608 bytes in use
[12387] 15 Dec 12:19:12 - 2 clients connected (0 slaves), 734608 bytes in use
[12387] 15 Dec 12:19:17 - 2 clients connected (0 slaves), 734608 bytes in use
[12387] 15 Dec 12:19:22 - 2 clients connected (0 slaves), 734608 bytes in use
[12387] 15 Dec 12:19:28 - 2 clients connected (0 slaves), 734608 bytes in use
...

2 seconds should be enough time to complete the tests, although if the tests or redis are running on a very slow machine you might increase the surefire.timeout value. The surefire plugin should print out the test that failed due to a timeout, do you see that output?

Another idea might be to connect with redis-cli prior to running the tests, run "client list" which should just show your redis-cli connection, then run the tests and when they hang run "client list" again and paste the line that shows the other (non-redis-cli) client. Can you try that?

commented
  1. the redis-server running is fast enough - x86-64/DuoCore with 2GB memory
  2. there is nothing output besides the followings:

Running com.lambdaworks.redis.ClientTest
Process 1323925483933 is killed.
[INFO] ------------------------------------------------------------------------
[ERROR] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] There are test failures.

Please refer to /home/hejin/labs/basics/redis/lettuce/target/surefire-reports for the individual test results.
[INFO] ------------------------------------------------------------------------
[INFO] For more information, run Maven with the -e switch
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 5 seconds
[INFO] Finished at: Thu Dec 15 13:04:46 CST 2011
[INFO] Final Memory: 18M/45M
[INFO] ------------------------------------------------------------------------
[root@game-devel-host /home/hejin/labs/basics/redis/lettuce]# ls target/surefire

Here it's the output redis client while the test going on (with the timeout settting, even the test framework abort, the java thread still running)

redis-client client list
addr=127.0.0.1:43789 fd=5 idle=0 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=client
addr=127.0.0.1:43791 fd=6 idle=27 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=flushall
addr=127.0.0.1:43792 fd=7 idle=27 flags=N db=0 sub=0 psub=0 qbuf=0 obl=0 oll=0 events=r cmd=NULL

Interesting, that CLIENT LIST output probably means the blocking test hasn't executed any redis commands other than the usual before-test FLUSHALL.

Only a few of the tests in ClientTest don't have a timeout, could you try adding a "timeout = 10" to the ones that don't? I'm suspecting this may be due to a DNS issue when looking up "invalid" which is the host name the connect failure tests use.

Also, run mvn with the following command line to get output in target/surefire-reports

mvn -Dtest.redirectTestOutputToFile=true -Dtest=ClientTest test

commented

The output after adding tmeout=10 to all testcases w/o timeout setting:


T E S T S

Running com.lambdaworks.redis.ClientTest
Tests run: 7, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 0.393 sec <<< FAILURE!
reconnect(com.lambdaworks.redis.ClientTest) Time elapsed: 0.063 sec <<< ERROR!
java.lang.Exception: test timed out after 10 milliseconds
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:281)
at com.lambdaworks.redis.protocol.Command.await(Command.java:124)
at com.lambdaworks.redis.RedisConnection.getOutput(RedisConnection.java:1014)
at com.lambdaworks.redis.RedisConnection.get(RedisConnection.java:212)
at com.lambdaworks.redis.ClientTest.reconnect(ClientTest.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.internal.runners.statements.FailOnTimeout$1.run(FailOnTimeout.java:28)
connectFailure(com.lambdaworks.redis.ClientTest) Time elapsed: 0.027 sec <<< ERROR!
java.lang.Exception: test timed out after 10 milliseconds
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:867)
at java.net.InetAddress.getAddressFromNameService(InetAddress.java:1246)
at java.net.InetAddress.getAllByName0(InetAddress.java:1197)
at java.net.InetAddress.getAllByName(InetAddress.java:1128)
at java.net.InetAddress.getAllByName(InetAddress.java:1064)
at java.net.InetAddress.getByName(InetAddress.java:1014)
at java.net.InetSocketAddress.(InetSocketAddress.java:142)
at com.lambdaworks.redis.RedisClient.(RedisClient.java:56)
at com.lambdaworks.redis.RedisClient.(RedisClient.java:42)
at com.lambdaworks.redis.ClientTest.connectFailure(ClientTest.java:52)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.internal.runners.statements.FailOnTimeout$1.run(FailOnTimeout.java:28)

connectPubSubFailure(com.lambdaworks.redis.ClientTest) Time elapsed: 0.024 sec <<< ERROR!
java.lang.Exception: test timed out after 10 milliseconds
at sun.misc.Unsafe.setMemory(Native Method)
at java.nio.DirectByteBuffer.(DirectByteBuffer.java:122)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:305)
at org.jboss.netty.channel.socket.nio.SocketSendBufferPool$Preallocation.(SocketSendBufferPool.java:156)
at org.jboss.netty.channel.socket.nio.SocketSendBufferPool.(SocketSendBufferPool.java:43)
at org.jboss.netty.channel.socket.nio.NioWorker.(NioWorker.java:84)
at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink.(NioClientSocketPipelineSink.java:84)
at org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.(NioClientSocketChannelFactory.java:162)
at org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.(NioClientSocketChannelFactory.java:108)
at com.lambdaworks.redis.RedisClient.(RedisClient.java:54)
at com.lambdaworks.redis.RedisClient.(RedisClient.java:42)
at com.lambdaworks.redis.ClientTest.connectPubSubFailure(ClientTest.java:60)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.internal.runners.statements.FailOnTimeout$1.run(FailOnTimeout.java:28)

Results :

Tests in error:
reconnect(com.lambdaworks.redis.ClientTest): test timed out after 10 milliseconds
connectFailure(com.lambdaworks.redis.ClientTest): test timed out after 10 milliseconds
connectPubSubFailure(com.lambdaworks.redis.ClientTest): test timed out after 10 milliseconds

Tests run: 7, Failures: 0, Errors: 3, Skipped: 0

Any clues?

Great! Looks like we're getting somewhere, now can you try running "telnet invalid" and see if that hangs too? While you're at it remove the timeout from reconnect, since it's probably just these two failure tests that are blocking.

commented

"telnet invalid" wil fail quickly and no hang.

So if you run "telnet invalid" you immediately get an error similar to "nodename nor servname provided, or not known"? If so let's try increasing the timeout for the connectFailure and connectPubSubFailure tests to 10,000 and see what the output is.

commented

yes.
done, the result is the same.

Could you please post the output here? The stack traces should be different.

commented

hmm, it seems u put too hash limitation for the return strings of the negative test cases.
put it to 100, there are 3 failure cases;
to 1000, there are 2;
to 10000, there is still 1 because the return string of the negative test case doesnt match expected one.

Well, the exception message is fine because it should be generated by RedisClient. The root of the problem is that the test cases are blocking on your machine, when they should immediately complete because the client can't connect to a host named "invalid". The timeout shouldn't be necessary, and those tests shouldn't fail due to a timeout.

I'm closing this issue because it appears to be environment-specific and not a bug in the tests. Please send me a private message if you're interested in further investigation. Thanks!