sanger / unified_warehouse

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DPL-096 index on the “results” column of the lighthouse_sample table in MLWH

rl15 opened this issue · comments

commented

User story
As a report writer (Matt F) I would like an index on the “results” column of the lighthouse_sample table in MLWH. This would speed up a number of the KPI queries that are specifically looking for the positive samples from the lighthouse table.

Who are the primary contacts for this story
Matt F (ICT)
Tom W

Acceptance criteria
To be considered successful the solution must allow:

  • Add index to “results” column of the lighthouse_sample table
  • Check impact on inserts
  • Consider if needs to be raised with MLWH notices?

Additional context
There are 40+ million samples, but only around 3 million positives. The current “results” column does not have an index and this causes the queries to run very very slowly.

When we added two indexes last week, Crawler's performance appeared to be unchanged. Crawler writes quite a lot of data to this database, but it is also bound mostly by network conditions as it downloads data from the SFTP. I don't think we have any better metric of how indexes alter insert performance, though! I did find this article that indicates that each index add 14ms or so onto each insert operation. https://use-the-index-luke.com/sql/dml/insert This does seem somewhat significant, but still only a fraction of the slow down Crawler might see. Since the table is unqueryable in a reasonable time frame without the index, I think the cost to the inserts has be accepted, regardless.