WordPress / openverse

Openverse is a search engine for openly-licensed media. This monorepo includes all application code.

Home Page:https://openverse.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Drop `ORDER BY` clause from copy step of image data refresh when adding a limit

krysal opened this issue · comments

Description

The recent image data refresh in the dev environment failed the copy step. We realized that turning on a limit to the number of rows copied (done in WordPress/openverse-infrastructure#908) was also applying an ordering clause, which is prohibitive for a table with so many rows (+700 million).

We still want a subset of the production data in dev and really don't need it to be pseudo-random so we can drop this piece of the clause generation and just let it select a limit:

# The audioset view does not have identifiers associated with it
if upstream_table != "audioset_view":
select_insert += d(
"""
ORDER BY identifier"""
)

Additional context

Related to #736, WordPress/openverse-api#474 (original PR adding the clause) and #3912 (because it's necessary for testing in staging).