open-contracting / deploy

Deployment configuration and scripts

Home Page:https://ocdsdeploy.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Implement retention policy for Kingfisher Process v2

jpmckinney opened this issue · comments

Old code:

# Delete collections that ended over a year ago, while retaining one set of collections per source from over a year ago.
cd {{ directory }}; . .ve/bin/activate; psql -h localhost kingfisher_process kingfisher_process -q -c '\t' -c "SELECT id FROM collection c WHERE c.deleted_at IS NULL AND c.store_end_at < date_trunc('day', NOW() - interval '1 year') AND EXISTS(SELECT 1 FROM collection d WHERE d.source_id = c.source_id AND d.transform_type = '' AND d.id > c.id AND d.deleted_at IS NULL AND d.store_end_at < date_trunc('day', NOW() - interval '1 year')) ORDER BY id DESC" | xargs -I{} python ocdskingfisher-process-cli --quiet delete-collection {}:
  cron.present:
    - identifier: KINGFISHER_PROCESS_STALE_COLLECTIONS
    - user: {{ entry.user }}
    - daymonth: 1
    - hour: 3
    - minute: 0
    - require:
      - virtualenv: {{ directory }}/.ve
      - postgres_database: kingfisher_process

# Delete collections that never ended and started over 2 months ago.
cd {{ directory }}; . .ve/bin/activate; psql -h localhost kingfisher_process kingfisher_process -q -c '\t' -c "SELECT id FROM collection WHERE deleted_at IS NULL AND store_start_at < date_trunc('day', NOW() - interval '2 month') AND store_end_at IS NULL ORDER BY id DESC" | xargs -I{} python ocdskingfisher-process-cli --quiet delete-collection {}:
  cron.present:
    - identifier: KINGFISHER_PROCESS_UNFINISHED_COLLECTIONS
    - user: {{ entry.user }}
    - daymonth: 1
    - hour: 3
    - minute: 15
    - require:
      - virtualenv: {{ directory }}/.ve
      - postgres_database: kingfisher_process

# Delete collections that ended over 2 months ago and have no data.
cd {{ directory }}; . .ve/bin/activate; psql -h localhost kingfisher_process kingfisher_process -q -c '\t' -c "SELECT id FROM collection WHERE deleted_at IS NULL AND store_end_at < date_trunc('day', NOW() - interval '2 month') AND COALESCE(NULLIF(cached_releases_count, 0), NULLIF(cached_records_count, 0), cached_compiled_releases_count) = 0 ORDER BY id DESC" | xargs -I{} python ocdskingfisher-process-cli --quiet delete-collection {}:
  cron.present:
    - identifier: KINGFISHER_PROCESS_EMPTY_COLLECTIONS
    - user: {{ entry.user }}
    - daymonth: 1
    - hour: 3
    - minute: 30
    - require:
      - virtualenv: {{ directory }}/.ve
      - postgres_database: kingfisher_process