Excluded provider does not appear in /providers after a successful announce if head has not changed

Question

Excluded provider does not appear in /providers after a successful announce if head has not changed

masih opened this issue 10 months ago · comments

Masih H. Derkani commented 10 months ago

Observed in prod on inga. Provider ID, using HTTP publisher,12D3KooWDCXxiSsLi1NT9tsiyimwV6YstQkrjTjD2hAkz2KRVAGG was excluded due to timeout which was a result of bad port number:

{"level":"error","ts":"2023-07-31T23:14:32.243Z","logger":"indexer/ingest","caller":"ingest/ingest.go:883","msg":"Failed to auto-sync with publisher","publisher":"12D3KooWDCXxiSsLi1NT9tsiyimwV6YstQkrjTjD2hAkz2KRVAGG","addr":"/ip4/203.217.18.114/tcp/3737/http/httpath/_ipni%2F","err":"cannot query head for sync: fetch request failed: Get \"http://203.217.18.114:3737/_ipni/head\": GET http://203.217.18.114:3737/_ipni/head giving up after 5 attempt(s): Get \"http://203.217.18.114:3737/_ipni/head\": context deadline exceeded (Client.Timeout exceeded while awaiting headers). Possibly incorrect topic configured"}

Then a successful announcement was made by the provider :

{"level":"info","ts":"2023-08-02T00:15:59.952Z","logger":"announce","caller":"announce/receiver.go:316","msg":"Handling direct announce","peer":"12D3KooWDCXxiSsLi1NT9tsiyimwV6YstQkrjTjD2hAkz2KRVAGG"}

This did not trigger a sync because the stop node in selector matched the sync node:

{"level":"info","ts":"2023-08-02T00:20:51.372Z","logger":"dagsync","caller":"dagsync/subscriber.go:812","msg":"cid to sync to is the stop node. Nothing to do","cid":"baguqeeraxstl6rsun6vgozbkoyy5gl67ebwvpjxaejgviiugdovrj3gol4yq","peer":"12D3KooWDCXxiSsLi1NT9tsiyimwV6YstQkrjTjD2hAkz2KRVAGG"}

Provider remains excluded from the providers list and does not appear at /providers.

The logs do not show the address inside the announcement.

Change the announce logs to also include the announced address so that we can confirm the later announcement had the correct address.
check that successful connection to a provider re includes it.
On receiving announcements for an excluded provider, contact it regardless of sync node and if the connection is established re include it.

Rod Vagg · Answer 1 · Wed Aug 02 2023 17:05:43 GMT+0800 (China Standard Time)

My matching log for that last log entry:

Aug 02 00:20:50 fletcher frisbii[3574021]: 2023-08-02T10:20:50.725+1000        INFO        frisbii        frisbii/main.go:128        Announcing to indexer as /dns/ipfs.va.gg/tcp/3747/http

That log entry's multiaddr string is the same one that's passed into Engine, from here through to here where we pass it to httpsync.NewPublisherWithoutServer(). So it shouldn't be caching an older address with the wrong port on my end (I think).

Masih H. Derkani · Answer 2 · Wed Aug 02 2023 17:06:33 GMT+0800 (China Standard Time)

Has there been any new advertisements since the announcement was made?

Andrew Gillis · Answer 3 · Thu Aug 03 2023 00:02:26 GMT+0800 (China Standard Time)

Adding addresses to announcement logs is OK. However...

The reason a provider is excluded is that there has not been a new advertisement for some amount of time. If the publisher address is incorrect, and then is corrected by a later announcement, then newer ads will be seen and the provider will no longer be excluded. If the address is corrected and there are still no new ads, being able to connect to the provider using the correct address does not change the fact that no new ads are available and the provider should still be excluded.

So, I don't think we want to do this:

contact it regardless of sync node and if the connection is established re include it.

If the indexer has already seen all the ads there, then connecting to and even fetching the already-seen ad will not constitute any new ad ingestion and should not re-include the provider.

Masih H. Derkani · Answer 4 · Thu Aug 03 2023 16:23:21 GMT+0800 (China Standard Time)

I see, that makes sense thanks Andrew.

@rvagg can I ask you to confirm if there was any new ads published with the right address?

Rod Vagg · Answer 5 · Fri Aug 04 2023 11:35:30 GMT+0800 (China Standard Time)

Yeah, I've been restarting my Frisbii a couple of times a day hoping that it'll somehow prod ipni into action. See above log.

That address should be fine, see a fetch with lassie:

I do have these logs though, after this issue was opened I started getting pings from IPNI from the head and one for the advert itself yesterday. The log line at the top of this screenshot is when my normal traffic dried up completely, 2023-08-01T11:05:08+10:00 was apparently when ipni stopped telling lassie clients that they could fetch wikipedia from me (context deadline exceeded is expected, client was fetching more than I've configured my Frisbii to be allowed to serve - I have a time & byte limit set). The log line after this screenshot was the fetch I performed above with lassie.

Rod Vagg · Answer 6 · Tue Aug 08 2023 13:40:00 GMT+0800 (China Standard Time)

I'm still just getting the /_ipni/head pings every 1-2 hours and nothing more. Today I tried changing the order of the CIDs I want to advertise so I got a different advert CID; instead of baguqeeraxstl6rsun6vgozbkoyy5gl67ebwvpjxaejgviiugdovrj3gol4yq as per the logs above, it's now baguqeera5ydfpew5afc7zoqkowtndf3blikejt5cq24qtcpukngqqpu5pn5q; but that doesn't seem to have had an effect.

btw I thought the user-agent was fixed with ipni/go-libipni#60 but perhaps storetheindex has a different http client?

Rod Vagg · Answer 7 · Thu Aug 10 2023 18:16:26 GMT+0800 (China Standard Time)

K, I figured this out and I think it was all me. Frisbii had a bug, which I guess was introduced at some point during updating my engine fork where the advertisements were giving the / path rather than the /_ipni/ httpath that Frisbii uses. So the persistent /_ipni/head pings I've been getting are probably from much older advertisements? I ended up having to burn through 2 more peer IDs to make it work properly in the end, it seems that storetheindex has a bit too much of a memory of where the addresses should be. At one stage when I figured this out, I was getting the /_ipni/head and then the advert CID request, /_ipni/baf..., but the next request for the chain was going to /baf... which is a bit weird but I'm assuming some cached addr for my peer ID.

Fixed, I hope, in ipld/frisbii#13, my Frisbii is back online serving data again at least!