test: TestIssue2746

Question

test: TestIssue2746

xiang90 opened this issue 8 years ago · comments

=== RUN   TestIssue2746
--- FAIL: TestIssue2746 (1.67s)
    cluster_test.go:360: #1: watch on http://127.0.0.1:20114 error: client: etcd cluster is unavailable or misconfigured

Xiang Li · Answer 1 · Fri Apr 22 2016 12:03:04 GMT+0800 (China Standard Time)

Not able to reproduce... Will try more...

Akihiro Suda · Answer 2 · Fri Apr 22 2016 12:37:43 GMT+0800 (China Standard Time)

Still reproducible (less than 1%) with the latest version (d32113a) on my machine (Xeon E3, 4 cores)

Xiang Li · Answer 3 · Fri Apr 22 2016 13:14:58 GMT+0800 (China Standard Time)

@AkihiroSuda

Can you type assert that error to client.ClusterError and print out its detail? (https://github.com/coreos/etcd/blob/master/client/cluster_error.go#L19-L33)

Akihiro Suda · Answer 4 · Fri Apr 22 2016 14:32:24 GMT+0800 (China Standard Time)

I got this ClusterError.

--- FAIL: TestIssue2746 (6.36s)
        cluster_test.go:351: create on http://127.0.0.1:20950 error: client: etcd cluster is unavailable or misconfigured(detail: error #0: read tcp 127.0.0.1:49676->127.0.0.1:20950: i/o timeout

Note that this error is raised from a slightly different point than a original point.

diff --git a/integration/cluster_test.go b/integration/cluster_test.go
index 4d7e9e0..c1be43d 100644
--- a/integration/cluster_test.go
+++ b/integration/cluster_test.go
@@ -347,7 +347,8 @@ func clusterMustProgress(t *testing.T, membs []*member) {
        key := fmt.Sprintf("foo%d", rand.Int())
        resp, err := kapi.Create(ctx, "/"+key, "bar")
        if err != nil {
-               t.Fatalf("create on %s error: %v", membs[0].URL(), err)
+               cerr := err.(*client.ClusterError)
+               t.Fatalf("create on %s error: %v(detail: %s)", membs[0].URL(), err, cerr.Detail())
        }
        cancel()

@@ -357,7 +358,9 @@ func clusterMustProgress(t *testing.T, membs []*member) {
                mkapi := client.NewKeysAPI(mcc)
                mctx, mcancel := context.WithTimeout(context.Background(), requestTimeout)
                if _, err := mkapi.Watcher(key, &client.WatcherOptions{AfterIndex: resp.Node.ModifiedIndex - 1}).Next(mctx); err != nil {
-                       t.Fatalf("#%d: watch on %s error: %v", i, u, err)
+                       cerr := err.(*client.ClusterError)
+                       t.Fatalf("#%d: watch on %s error: %v(detail: %s)", i, u, err, cerr.Detail())
+
                }
                mcancel()
        }

Xiang Li · Answer 5 · Tue May 17 2016 01:38:22 GMT+0800 (China Standard Time)

@heyitsanthony Can you take this over? I cannot reproduce this on my local machine :(. Thanks!

Anthony Romano · Answer 6 · Thu Jun 02 2016 07:02:12 GMT+0800 (China Standard Time)

ETCD_ELECTION_TIMEOUT_TICKS wasn't set in semaphore like travis so it was triggering a new election which was causing the lost leader to drop messages. I tried to repro with the election ticks set to 600 and it seemed to work OK. Updated semaphore and marking this as closed.