WordPress / openverse

Openverse is a search engine for openly-licensed media. This monorepo includes all application code.

Home Page:https://openverse.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The `add_license_url` DAG keeps timing out

krysal opened this issue · comments

Description

This DAG keeps timing out for unknown reasons when the number of items to modify is relatively high (>500k). Instead, it was verified that the batched_update DAG can handle this kind of updates for loads of millions of row. It was tested to back fill the license (by-nc-sa, 2.0) and it updated 11,090,909 records successfully.

However, continuous executions have resulted in the reappearance of licenses in the group of rows missing the field, so there could be ingestion flows that are not filling in this data or some other problem (#4318). I'd like to update the add_license_url DAG to use the batched_update and automate this process until we make sure all rows are complete.

Additional context

Related to #3885 and #4318.