matrix-org / matrix-hookshot

A bridge between Matrix and multiple project management services, such as GitHub, GitLab and JIRA.

Home Page:https://matrix-org.github.io/matrix-hookshot/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The RSS bot started skipping entries again

bkil opened this issue · comments

I noticed it only accidentally after I know that something should have appeared as a message. Could you perhaps hook up some kind of automated monitoring of this?

As a continuation of #778

I noticed it for entries on the following feed about 13 hours ago if you would like to investigate a specific case:

The bot also failed to post new entries from here:

https://gitlab.com/bkil/secuchart/-/commits/master.atom

The matrix room was recently upgraded if this is of any relevance. I made sure to unfollow the feed and remove the integration before upgrading the old room and then readd the integration and feed to the new room:

https://matrix.to/#/#secuchart:matrix.org

Still happens as of the past few days, including today. We have temporarily hooked up another RSS bot in the room to compare against so it is more visible.

Hey, we do have some sample feeds we monitor for issues, but of course it doesn't provide an overall picture. Do you have feed urls for the ones that are failing? Even if we miss an item, we would retry it on next fetch unless we have already marked it as handled.

Yes, that sometimes skips. Today it was from this one:
https://bkil-bot.github.io/osm-rss-juggler/osm-notes.xml

And specifically these two entries got lost 6 hours ago:
bkil-bot/osm-rss-juggler@c341e29#diff-7b943466d8c0bf9f148be2a11b68c00cb75f48091a82cd67636ff05ad3f0cd06R6

Could it happen that it tries to submit the feed comment to the given matrix room but depending on load, the matrix HS API NACK's it, while Hookshot marks the notification as delivered, clearing it from its queue?

This is my current theory yes. We clear it from the queue at the point where it is parsed from the feed, but we don't retry if it fails to be delivered. Essentially a victim of the lack of #725

A few other ones were missing from the same feed a few days ago. One that we also caught some days ago was lack of a new commit notification from a low activity gitlab.com repo. We also missed github.com releases in the past.

Would it be possible to log such failures so the suspected root cause could be confirmed to be the same, and not caused by another issue?

Ah yes:

	
2024-02-04 12:33:42.182	
WARN 12:33:42:181 [Bridge] Connection FeedConnection https://bkil-bot.github.io/osm-rss-juggler/osm-notes.xml failed to handle feed.entry: Error: ESOCKETTIMEDOUT
	
	
2024-02-04 11:28:20.753	
WARN 11:28:20:753 [Bridge] Connection FeedConnection https://bkil-bot.github.io/osm-rss-juggler/osm-notes.xml failed to handle feed.entry: Error: ESOCKETTIMEDOUT
	
	
2024-02-03 06:32:37.183	
WARN 06:32:37:183 [Bridge] Connection FeedConnection https://bkil-bot.github.io/osm-rss-juggler/osm-notes.xml failed to handle feed.entry: Error: ESOCKETTIMEDOUT

You know the source of the entries better than I do, but it looks like a different problem. As if it failed to ingest the feed itself.

This is definitely matrix-side. The Connection FeedConnection...failed to handle feed.entry error means we've managed to parse out the feed already but failed to handle a feed.entry event, which is what is emitted from the parser. A.k.a we very much can narrow down the problem to a Bridge <-> HS failure.

We already have metrics on this and for integrations.ems.host it's failing a small % of the time. I expect #891 will improve this.

@bkil Could you let me know if this gets better or worse, I've deployed some experimental patches to the integrations server and would like to know if we're still seeing issues?

EDIT: A glance at the logs suggest the main culprits have been dealt with.

Many of us noticed that the RSS bot started reproducing old feed entries from earlier (possibly ones that it skipped previously) Is this an expected side effect?

@bkil Yep, there was a mistaken cache eviction. That should be fixed, can you let me know if either that happens again OR there is a missed feed?

@Half-Shot what I can see this feed https://github.com/iSoron/uhabits/releases.atom did not update yet, last post in room is 2.1.3 (Aug 28, '23) and 2.2.0 never appeared.

@mahdi1234 I think you won't see previously skipped entries retroactively. Please watch out for skipped new entries from this point in time forward.

Well my comment was based on OR there is a missed feed to me it looks like in all rooms I've seen it takes current full feed and re-post. But I might be wrong :)

Yes, for absolute clarity the issue is sadly that hookshot stores a feed as checked, and then fails to send the message to Matrix. It won't recover those events. This should now be rare.

@Half-Shot Could you please look into the logs again for me about a lost entry?
It was under this path:
https://bkil-bot.github.io/osm-rss-juggler/wiki.xml

And added by this git commit 6 hours ago:
bkil-bot/osm-rss-juggler@4cf05c2

Because this feed rarely changes (and was empty for some time up to this point), the other RSS bot also just noticed it 2 hours ago.

Hm, it claims to have handled one feed item from that feed, but it might have been hit by #806? When we connect a feed for the first time, we store all the guids that exist in cache and then handle any new ones from that point.

Did you subscribe to the feed when it was still entirely empty?

EDIT: From reading the code this is likely tangled up with #806-ish. The first non-empty read from a feed is ignored, so if you have no items in you feed upon subscribe, and then you add one item, then it's still ignored. The subsequent reads should be okay. Since it's a bit suble, I'm tracking as #893.

No, we have been subscribed to this feed for months, and the bot had most recently forwarded a post from it on the 6th of February.
bkil-bot/osm-rss-juggler@d151dd8

It's just that due to the dynamic nature of such a wiki feed, it can go blank for days, and then get non-blank for a day or two at a time as long as the edit horizon contains hits to show. So this might be a different variant or special recurring case of #893.

New information. One entry from two feeds each got lost that were published by the following commit 2.5h ago:

bkil-bot/osm-rss-juggler@758e71e

Feed URLs and GUIDs:

https://bkil-bot.github.io/osm-rss-juggler/osm-notes.xml
GUID https://www.openstreetmap.org/note/4057931#c8972088

https://bkil-bot.github.io/osm-rss-juggler/mastodon-openstreetmap.xml
GUID https://en.osm.town/@openstreetmap/111919104056767098

All five added entries from this feed got lost 22 hours ago:

bkil-bot/osm-rss-juggler@bb3aa92#diff-7b943466d8c0bf9f148be2a11b68c00cb75f48091a82cd67636ff05ad3f0cd06R6

Feed URL:

GUIDs:

  • https://www.openstreetmap.org/note/4127077#c9010944
  • https://www.openstreetmap.org/note/4127075#c9010941
  • https://www.openstreetmap.org/note/4127074#c9010938
  • https://www.openstreetmap.org/note/4127069#c9010924
  • https://www.openstreetmap.org/note/4127068#c9010921

Could you perhaps have a look?

The following entry was lost 1 hour ago:

bkil-bot/osm-rss-juggler@5f70b45#diff-ec44ff024548ccff9e78ec7ebb5a61b78b6eef11b15ad0ad2ba020a46f666163R6

Feed URL:

https://bkil-bot.github.io/osm-rss-juggler/changeset-discuss.xml

GUID:

https://www.openstreetmap.org/changeset/147962763#comment_1

The following entry was lost 1 hour ago:

bkil-bot/osm-rss-juggler@6ae88ae#diff-7b943466d8c0bf9f148be2a11b68c00cb75f48091a82cd67636ff05ad3f0cd06R6

Feed URL:

https://bkil-bot.github.io/osm-rss-juggler/osm-notes.xml

GUID:

https://www.openstreetmap.org/note/4048599#c9028313

Note that due to its instability, most people I know with knowledge of how to debug such issues had stopped using hookshot in the past months and replaced it with other solutions. Hence it will probably only be me contributing reports from now on.

Another report. These feeds have a new entry every Sunday morning.

The 4 entries of February 18th were all posted. The 8 entries of February 25th and March 3rd were all skipped. The bot says "Successful fetch" and no error is posted in the room, when the feed is expected to discover an entry.

https://www.weeklyosm.eu/en/feed
https://www.weeklyosm.eu/fr/feed
https://www.weeklyosm.eu/es/feed
https://weeklyosm.eu/it/feed

Another report just to let know this is still an issue.

Tooted a new toot to Mastodon, feed URL https://some.hacklab.fi/@tampere.rss

Feeds says successful fetch but nothing posted to room.

Still working very unreliably. I tooted two toots with #sslnr hashtag. First toot was never relayed to Matrix, the second one was. Feed URL: https://sauna.social/tags/sslnr.rss

Has there been any progress on the issue, or should we migrate to for example t2bot's RSS bot?

I can confirm the same issue. The last feed I got from https://xkcd.com/rss.xml is on April 26th, while there has been at least five more posts since then. Not only that, the bot reports several successful checks today. No posts though.