dtests sometime fail with unable to connect to scylla-jmx
bhalevy opened this issue · comments
See scylladb/scylla-ccm#223 (comment)
Still seeing this, e.g. https://jenkins.scylladb.com/view/master/job/scylla-master/job/next/1678/testReport/junit/bootstrap_test/TestBootstrap/start_stop_test/
Scylla version 359b32fb63e2c5f88ff855e535b647984e2fe623
Traceback (most recent call last):
File "/usr/lib64/python3.7/unittest/case.py", line 60, in testPartExecutor
yield
File "/usr/lib64/python3.7/unittest/case.py", line 645, in run
testMethod()
File "/jenkins/workspace/scylla-master/next/scylla-dtest/bootstrap_test.py", line 53, in start_stop_test
cluster.start(wait_for_binary_proto=True, wait_other_notice=True)
File "/jenkins/workspace/scylla-master/next/scylla-ccm/ccmlib/scylla_cluster.py", line 137, in start
started = self.start_nodes(**args)
File "/jenkins/workspace/scylla-master/next/scylla-ccm/ccmlib/scylla_cluster.py", line 109, in start_nodes
profile_options=profile_options, no_wait=no_wait)
File "/jenkins/workspace/scylla-master/next/scylla-ccm/ccmlib/scylla_node.py", line 516, in start
raise NodeError(e_msg, scylla_process)
ccmlib.node.NodeError: Error starting node node1: unable to connect to scylla-jmx port 127.0.89.1:7189
https://jenkins.scylladb.com/view/master/job/scylla-master/job/next/1678/artifact/logs-release.2/dtest.log indicates that 2 processes were killed.
Since the test starts only 1 node these should be scylla
and scylla-jmx
2020-03-03 15:44:01,849 169 dtest DEBUG | bootstrap_test.py:TestBootstrap.start_stop_test - cluster ccm directory: /jenkins/workspace/scylla-master/next/scylla/.dtest/dtest-ny5nvwt0
2020-03-03 15:44:01,850 169 dtest DEBUG | bootstrap_test.py:TestBootstrap.start_stop_test - Starting Scylla cluster from directory /jenkins/workspace/scylla-master/next/scylla-dtest/../scylla/build/release/
2020-03-03 15:44:01,853 169 dtest DEBUG | bootstrap_test.py:TestBootstrap.start_stop_test - Allocated cluster ID 89: /jenkins/workspace/scylla-master/next/scylla/.dtest/dtest-ny5nvwt0
2020-03-03 15:44:01,860 169 dtest DEBUG | bootstrap_test.py:TestBootstrap.start_stop_test - configuring skip_wait_for_gossip_to_settle=0 for single_node test
2020-03-03 15:44:01,861 169 dtest DEBUG | bootstrap_test.py:TestBootstrap.start_stop_test - populating cluster with one node
2020-03-03 15:44:15,809 169 dtest DEBUG | bootstrap_test.py:TestBootstrap.start_stop_test - starting cluster
2020-03-03 15:44:45,900 169 dtest DEBUG | bootstrap_test.py:TestBootstrap.start_stop_test - Test failed with errors: [(<bootstrap_test.TestBootstrap testMethod=start_stop_test>, (<class 'ccmlib.node.NodeError'>, NodeError('Error starting node node1: unable to connect to scylla-jmx port 127.0.89.1:7189'), <traceback object at 0x7f208c536690>))]
2020-03-03 15:44:45,905 169 dtest DEBUG | bootstrap_test.py:TestBootstrap.start_stop_test - removing ccm cluster test at: /jenkins/workspace/scylla-master/next/scylla/.dtest/dtest-ny5nvwt0
2020-03-03 15:44:46,981 169 dtest DEBUG | bootstrap_test.py:TestBootstrap.start_stop_test - proc 182 killed - cluster 127.0.89.
2020-03-03 15:44:46,982 169 dtest DEBUG | bootstrap_test.py:TestBootstrap.start_stop_test - proc 184 killed - cluster 127.0.89.
2020-03-03 15:44:46,982 169 dtest DEBUG | bootstrap_test.py:TestBootstrap.start_stop_test - Freeing cluster ID 89: link /jenkins/workspace/scylla-master/next/scylla/.dtest/89
So it seems like the scylla-jmx process is up but unresponsive.
As I wrote on scylladb/scylla-ccm#223 (comment)
I saw this today:
Using config file: /jenkins/workspace/scylla-master/byo/dtest-byo/scylla/.dtest/dtest-3ngmni08/test/node1/conf/scylla.yaml
library initialization failed - unable to allocate file descriptor table - out of memory
@bhalevy The "unable to allocate file descriptor table" is an artifact of the node running out of memory. You ran the test on thor
so it's unfortunately pretty common scenario...