Calculation of case sequencing QC slows down /orders
islean opened this issue · comments
Description
The calculations done by the QualityControllerService
are expensive, meaning that usage of the methods in the web services, e.g. CiGRID, adds quite a lot of latency. As an example found in #3198, calculating the number of cases failing sequencing QC within orders roughly doubles the response time when rendering the orders page. Since this data is useful, we should think of a better way to calculate this.
Suggested solution
Easiest would probably be to save this data on case and sample level in status db. Would require refinement on when these values should be set. At the end of demultiplexing?
This can be closed when
Finding cases which fail sequencing qc can be done quickly.
I think the calculation is fine as longs as we store the result
We want to store it.
- Should not be repeated in the
/orders
endpoint. - Done in a weird place: each time we check whether a case is ready to be started.
Conclusion
- Extract separate service or update existing one and poll with CLI command to perform sequencing QC
- Use a simple database filter for the stored sequencing QC result when checking which cases an analysis can be started for.
- Same for the
/orders
endpoint, just use a simple database filter.
Questions
Q: How should the logic be triggered?
A: Regularly run CLI command which picks up any cases for which sequencing QC has not been done.
Q: which model should store the sequencing QC result for the case?
A: the Sample
model <- this does absolutely not make sense. We can store the sample QC on the samples, the case QC belongs on the case model.
Q: how should we handle externally sequenced samples?
A: always pass? Check code or ask @karlnyr.
@seallard
Externally sequenced samples would always pass as they do not have a requirement of target reads.
![image](https://private-user-images.githubusercontent.com/45558267/333162980-bfb89e61-46a9-408a-9606-bf2f900a663a.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjAxNzUyOTEsIm5iZiI6MTcyMDE3NDk5MSwicGF0aCI6Ii80NTU1ODI2Ny8zMzMxNjI5ODAtYmZiODllNjEtNDZhOS00MDhhLTk2MDYtYmYyZjkwMGE2NjNhLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MDUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzA1VDEwMjMxMVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWUzZDA4MDY5MDJiNTAwMjQyZDVhOTEzNjhmZTRmMWEzMGZjMDJkY2YxNDcxNjRiZWJiMjAxOWNkYWNhYmEzNTYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.GD9LMvm_XXTozd96eYV0cjSBU5lxdisu_sSWJ_l3Mo8)
The logic for external sample reads being picked up would then be the above filter for passing samples, collected with that when we run the cg add external
we will set the case status to analyse
and the automation would pick it up
Given that the sequencing QC is calculated per case (and can pass for individual samples but not on the case level) and the analyses are started per case (given that the sequencing QC passes), we should store the overall case sequencing QC on the case. We can also store the sample QC on the samples.
Implementation plan
- Add
sequencing_qc_status
to case model with enumpending
,passed
,failed
(default ispending
). - Add new CLI command which picks up cases for which the qc needs to be done and updates the
sequencing_qc_status
. - Refactor old logic to decouple the sequencing qc from the other logic and filter based on the stored
sequencing_qc_status
.
This is very straightforward to implement, we just need to have a meeting to get everyone on the same page and as to ensure the required PR:s do not get blocked.
Rename field on case to ready_for_analysis
.