Ensuring bosh-dns query for tservers is working with drivers, peers table replicating, etc.

Question

Ensuring bosh-dns query for tservers is working with drivers, peers table replicating, etc.

aegershman opened this issue 4 years ago · comments

some client-side logging that a trial user is seeing:

2020-03-12T08:02:26.635-05:00 [APP/PROC/WEB/0] [OUT] 🦖 [main] WARN com.datastax.driver.core.Cluster:2323 - You listed q-s0.tserver.dev-services-network.yugabyte-instance-f19c1f496b63.bosh/10.156.74.12:9042 in your contact points, but it wasn't found in the control host's system.peers at startup

2020-03-12T08:02:26.636-05:00 [APP/PROC/WEB/0] [OUT] 🦖 [main] WARN com.datastax.driver.core.Cluster:2323 - You listed q-s0.tserver.dev-services-network.yugabyte-instance-f19c1f496b63.bosh/10.156.73.22:9042 in your contact points, but it wasn't found in the control host's system.peers at startup

2020-03-12T08:02:26.636-05:00 [APP/PROC/WEB/0] [OUT] 🦖 [main] WARN com.datastax.driver.core.Cluster:2323 - You listed q-s0.tserver.dev-services-network.yugabyte-instance-f19c1f496b63.bosh/10.156.73.136:9042 in your contact points, but it wasn't found in the control host's system.peers at startup

2020-03-12T08:02:26.636-05:00 [APP/PROC/WEB/0] [OUT] 🦖 [main] WARN com.datastax.driver.core.Cluster:2323 - You listed q-s0.tserver.dev-services-network.yugabyte-instance-f19c1f496b63.bosh/10.156.73.24:9042 in your contact points, but it wasn't found in the control host's system.peers at startup

2020-03-12T08:02:26.637-05:00 [APP/PROC/WEB/0] [OUT] 🦖 [main] WARN com.datastax.driver.core.Cluster:2323 - You listed q-s0.tserver.dev-services-network.yugabyte-instance-f19c1f496b63.bosh/10.156.74.14:9042 in your contact points, but it wasn't found in the control host's system.peers at startup

...

2020-03-12T07:55:19.689-05:00 [APP/PROC/WEB/0] [OUT] 🦖 [cluster1-worker-41] WARN com.datastax.driver.core.ControlConnection:559 - No row found for host /10.156.73.24 in q-s0.tserver.dev-services-network.yugabyte-instance-f19c1f496b63.bosh/10.156.73.137:9042's peers system table. /10.156.73.24 will be ignored.
2020-03-12T13:20:19.698-05:00 [APP/PROC/WEB/0] [OUT] 🦖 [cluster1-worker-567] INFO com.datastax.driver.core.Cluster:2327 - Cassandra host /10.156.74.12:9042 removed

see:

from gh issue:

The 'system.local' query reveals that the `broadcast_address' column doesn't seem to have the desired IP address.

Looks like the default option of --rpc_bind_addresses (0.0.0.0) is causing YB to be non-deterministic when there are multiple options to bind, and it is picking to bind an ipv6 address, and this could be causing the warning you are seeing in the client logs.

For your case, where the IP you want to use is 192.168.0.100, could you update your yb-master & yb-tserver conf files to also pass?

--rpc_bind_addresses 192.168.0.100

Note: The rpc_bind_address is also used as the indentity of the entity (yb-tserver/yb-master). So for this change, it is better to do a clean-slate creation/fresh install of the environment. So you would want to wipe out your old data directory - guess that's OK as this is a test deployment.

Aaron Gershman · Answer 1 · Fri Mar 13 2020 02:27:53 GMT+0800 (China Standard Time)

here's what I'm seeing when querying select * from system.local

tserver/40dd8fc9-b2da-4517-a066-649e777f45ad:~# source /var/vcap/packages/python-*/bosh/runtime.env
tserver/40dd8fc9-b2da-4517-a066-649e777f45ad:~# /var/vcap/packages/yugabyte/bin/cqlsh --cqlshrc /var/vcap/jobs/yb-tserver/config/cqlshrc 
Connected to local cluster at q-m88141n3s0.q-g86407.bosh:9042.
[cqlsh 5.0.1 | Cassandra 3.9-SNAPSHOT | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cassandra@cqlsh> select * from system.local
   ... 
   ... ;

 key   | bootstrapped | broadcast_address | cluster_name  | cql_version | data_center | gossip_generation | host_id                              | listen_address | native_protocol_version | partitioner                                 | rack       | release_version | rpc_address  | schema_version                       | thrift_version | tokens                  | truncated_at
-------+--------------+-------------------+---------------+-------------+-------------+-------------------+--------------------------------------+----------------+-------------------------+---------------------------------------------+------------+-----------------+--------------+--------------------------------------+----------------+-------------------------+--------------
 local |    COMPLETED |      10.156.89.36 | local cluster |       3.4.2 |   us-west-2 |                 0 | b04fb57f-1a27-738d-4040-f9e27dd3a688 |   10.156.89.36 |                       4 | org.apache.cassandra.dht.Murmur3Partitioner | us-west-2a |    3.9-SNAPSHOT | 10.156.89.36 | 00000000-0000-0000-0000-000000000000 |         20.1.0 | {'6148820866244280320'} |         null

the peers of a sandbox cluster

cassandra@cqlsh> select * from system.peers;

 peer          | data_center | host_id                              | preferred_ip  | rack       | release_version | rpc_address   | schema_version                       | tokens
---------------+-------------+--------------------------------------+---------------+------------+-----------------+---------------+--------------------------------------+--------------------------
   10.156.90.8 |   us-west-2 | 81c53aae-a11a-f8b9-2a4a-f3fbd8d89251 |   10.156.90.8 | us-west-2c |            null |   10.156.90.8 | 00000000-0000-0000-0000-000000000000 |                    {'0'}
 10.156.89.139 |   us-west-2 | db60487a-ee7b-95a2-7c40-bca5c6d79ca3 | 10.156.89.139 | us-west-2b |            null | 10.156.89.139 | 00000000-0000-0000-0000-000000000000 | {'-6149102341220990976'}

(2 rows)

and the peers of the cluster under question:

cassandra@cqlsh> select * from system.peers;

 peer          | data_center | host_id                              | preferred_ip  | rack       | release_version | rpc_address   | schema_version                       | tokens
---------------+-------------+--------------------------------------+---------------+------------+-----------------+---------------+--------------------------------------+--------------------------
 10.156.73.137 |   us-west-2 | 7df4c54b-e5a5-c6aa-ed4b-c21d968afd23 | 10.156.73.137 | us-west-2b |            null | 10.156.73.137 | 00000000-0000-0000-0000-000000000000 |                    {'0'}
  10.156.73.24 |   us-west-2 | 7cf6f9a6-3665-e88d-6d4a-466280323a73 |  10.156.73.24 | us-west-2a |            null |  10.156.73.24 | 00000000-0000-0000-0000-000000000000 |  {'3074269695633784832'}
 10.156.73.136 |   us-west-2 | 94a8abff-c4b2-4686-834f-613e6b61dc9e | 10.156.73.136 | us-west-2b |            null | 10.156.73.136 | 00000000-0000-0000-0000-000000000000 |  {'6148539391267569664'}
  10.156.73.22 |   us-west-2 | 3a2e396d-611b-5bbd-4a43-a226fa67a8b8 |  10.156.73.22 | us-west-2a |            null |  10.156.73.22 | 00000000-0000-0000-0000-000000000000 |  {'9222809086901354496'}
  10.156.74.12 |   us-west-2 | 7ae41c41-eb5d-608b-094f-5192dc288ef7 |  10.156.74.12 | us-west-2c |            null |  10.156.74.12 | 00000000-0000-0000-0000-000000000000 | {'-3075395595540627456'}

(5 rows)

Aaron Gershman · Answer 2 · Sun Dec 13 2020 03:19:02 GMT+0800 (China Standard Time)

Not actually a bug. This is fine.