t2bot / matrix-media-repo

Highly configurable multi-domain media repository for Matrix.

Home Page:https://docs.t2bot.io/matrix-media-repo

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Host media export failed due to "context deadline exceeded"

jaywink opened this issue · comments

An export for host media ran for a while and then ended up failing with:

time="2023-12-18 12:13:52.195 Z" level=error msg="Error during export: Get "https://redacted.s3.dualstack.eu-north-1.amazonaws.com/long-random-id-looking-thing-here\": context deadline exceeded" internal_flag=1 task_id=4

Possibly an error downloading from the S3 bucket?

Unfortunately the export is not marked as failed in any way. Polling the task shows the following information:

{
    "task_id": 4,
    "task_name": "export_data",
    "params": {
      "export_id": "export-id-redacted",
      "include_s3_urls": false,
      "server_name": "domain-redacted"
    },
    "start_ts": 1702901122746,
    "end_ts": 1702901632195,
    "is_finished": true
  }

It's not possible for a script to know whether the export finished successfully or not.

Idea: maybe a status field with something like success, error etc, or some other field to indicate whether the export finishes successfully?

The logs on a retry with debug logging have a little confusing ordering 🤔

time="2023-12-18 13:48:15.045 Z" level=info msg="Task 'export_data' completed" internal_flag=1 task_id=6
time="2023-12-18 13:48:15.042 Z" level=error msg="Error during export: context deadline exceeded" internal_flag=1 task_id=6
time="2023-12-18 13:48:14.985 Z" level=debug msg="Writing tar file to gzip container: export-manifest.tar" internal_flag=1 task_id=6 v2archive-entity=domain.tld v2archive-id=export-id
time="2023-12-18 13:48:13.532 Z" level=debug msg="Writing tar file to gzip container: export-part-3.tar" internal_flag=1 task_id=6 v2archive-entity=domain.tld v2archive-id=export-id
time="2023-12-18 13:48:13.532 Z" level=debug msg="Finishing export archive" internal_flag=1 task_id=6
time="2023-12-18 13:47:13.530 Z" level=debug msg="Getting whole cached object for bc4663ed5d156254cb2443c7b665...." internal_flag=1 task_id=6
time="2023-12-18 13:47:13.526 Z" level=debug msg="Downloading mxc://domain.tld/745a3df95f1332106053c3..." internal_flag=1 task_id=6
time="2023-12-18 13:47:13.144 Z" level=debug msg="Getting whole cached object for cdbe2ac9fa0e3c9f8e9f7c664b18e46ca..." internal_flag=1 task_id=6
time="2023-12-18 13:47:13.140 Z" level=debug msg="Downloading mxc://domain.tld/3e17e65440100d8be78c5..." internal_flag=1 task_id=6
time="2023-12-18 13:47:12.644 Z" level=debug msg="Getting whole cached object for 0e9f5108a779ba5dd8fdf7e32adcf18..." internal_flag=1 task_id=6

Given the two export tasks produced the exact identical tar files from both jobs, it's probably a single file it crashes on?

image

This is approx 10% of the media usage for this host (reported by mediarepo admin API).

Fixed by #508