NATSNoRespondersException after upgrading to 1.0.5 (from 1.0.4) when watching large KV buckets
jlumsden-mts opened this issue · comments
Defect
- [*] Included a [Minimal, Complete, and Verifiable example] (https://stackoverflow.com/help/mcve)
Versions of NATS.Client
and nats-server
: NATS.Client 1.0.5 (works in 1.0.4) with server 2.9.19
OS/Container environment: Windows
Steps or code to reproduce the issue:
Extend TestKeyValue.cs with:
[Fact]
public void TestWatchManyKeys()
{
const int NUM_MESSAGES = 1000;
Context.RunInJsServer(c =>
{
// get the kv management context
IKeyValueManagement kvm = c.CreateKeyValueManagementContext();
// create the bucket
kvm.Create(KeyValueConfiguration.Builder()
.WithName(BUCKET)
.WithMaxHistoryPerKey(10)
.WithStorageType(StorageType.Memory)
.Build());
IKeyValue kvContext = c.CreateKeyValueContext(BUCKET);
for (int i = 0; i < NUM_MESSAGES; i++)
{
kvContext.Put(i.ToString(), i.ToString());
}
TestKeyValueWatcher watcher = new TestKeyValueWatcher(true);
var sub = kvContext.Watch(">", watcher, watcher.WatchOptions);
int count = 0;
while (watcher.EndOfDataReceived == 0 && count < 100)
{
Thread.Sleep(10);
count++;
}
Assert.True(watcher.EndOfDataReceived > 0);
sub.Unsubscribe();
});
}
Expected result:
Test should pass in 1.0.4 and 1.0.5
Actual result:
Test passes in 1.0.4
Test fails in 1.0.5 and later: Watch method call throws NATSNoRespondersException
Initially I thought this was due to my real app watching multiple buckets but it is reproducible by adding lots of keys into a single bucket. I'm assuming some kind of timeout is occurring when it takes too long to reach end of data? If you reduce NUM_MESSAGES to 100 it will pass. I don't think I have >100 keys in my real app but the values will be much larger than this example so it appears to be message size dependent.
Narrowed it down to passing before and failing after this commit:
ea5f4b29e2e24791188c7a11fc6ea11b3cfb5f5e
No responders are available for the request.
at NATS.Client.Connection.RequestSyncImpl(String subject, MsgHeader headers, Byte[] data, Int32 offset, Nullable`1 count, Int32 timeout) in C:\nats\nats.net\src\NATS.Client\Connection.cs:line 2961
at NATS.Client.Connection.Request(String subject, Byte[] data, Int32 timeout) in C:\nats\nats.net\src\NATS.Client\Connection.cs:line 3048
at NATS.Client.JetStream.JetStreamBase.RequestResponseRequired(String subject, Byte[] bytes, Int32 timeout) in C:\nats\nats.net\src\NATS.Client\JetStream\JetStreamBase.cs:line 164
at NATS.Client.JetStream.JetStreamBase.GetConsumerInfoInternal(String streamName, String consumer) in C:\nats\nats.net\src\NATS.Client\JetStream\JetStreamBase.cs:line 65
at NATS.Client.JetStream.JetStream.LookupConsumerInfo(String lookupStream, String lookupConsumer) in C:\nats\nats.net\src\NATS.Client\JetStream\JetStream.cs:line 449
at NATS.Client.JetStream.JetStreamPushAsyncSubscription.GetConsumerInformation() in C:\nats\nats.net\src\NATS.Client\JetStream\JetStreamPushAsyncSubscription.cs:line 45
at NATS.Client.KeyValue.KeyValueWatchSubscription..ctor(KeyValue kv, String keyPattern, IKeyValueWatcher watcher, KeyValueWatchOption[] watchOptions) in C:\nats\nats.net\src\NATS.Client\KeyValue\KeyValueWatchSubscription.cs:line 83
at NATS.Client.KeyValue.KeyValue.Watch(String key, IKeyValueWatcher watcher, KeyValueWatchOption[] watchOptions) in C:\nats\nats.net\src\NATS.Client\KeyValue\KeyValue.cs:line 156
at IntegrationTests.TestKeyValue.<>c.<TestWatchManyKeys>b__30_0(IConnection c) in C:\nats\nats.net\src\Tests\IntegrationTests\TestKeyValue.cs:line 1259
at IntegrationTests.SuiteContext.RunInJsServer(TestServerInfo testServerInfo, Action`1 test) in C:\nats\nats.net\src\Tests\IntegrationTests\TestSuite.cs:line 124
at IntegrationTests.KeyValueSuiteContext.RunInJsServer(Action`1 test) in C:\nats\nats.net\src\Tests\IntegrationTests\TestSuite.cs:line 423
at IntegrationTests.TestKeyValue.TestWatchManyKeys() in C:\nats\nats.net\src\Tests\IntegrationTests\TestKeyValue.cs:line 1238
I think I figured it out. PR coming soon.
@jlumsden-mts Thank for you taking the time to document this. I flat out missed something. It's fixed now. Fixed in #795
No problem @scottf, thanks for sorting it out so quickly