DPL-437-2 Billing report: Create stored procedure
harrietc52 opened this issue · comments
User story
As a developer, I would like to turn the billing query into a stored procedure on the MLWH database, so it can be used by Tableau to show the billing data
Who are the primary contacts for this story
@harrietc52
Who is the nominated tester for UAT
e.g. John S (don't include surnames in public repos)
Acceptance criteria
To be considered successful the solution must allow:
- Named
billing_report_stored_proc
- Accept two values
- 1st param: ‘from’ e.g. '2022-07-26 00:00:00'
- 2nd param: ‘to’ e.g. '2022-08-24 23:59:59'
- Can be called by e.g.
CALL billing_report_stored_proc('2022-07-26 00:00:00', '2022-08-24 23:59:59')
- Keep Matt Francis informed; Matt F is creating a Tableau view which calls this stored proc
- Ensure this stored procedure is accessible for Matt, from Tableau (we are not tracking the Tableau report creation)
Dependencies
This story is blocked by the following dependencies:
References
This story has a non-blocking relationship with:
- This is a spin-out story from #386
Additional context
Add any other context or screenshots about the feature request here.
This has already been created for testing in Training (MLWH prod_data), but is waiting for final improvements from the query. So this stories scope involves the updating of the existing stored proc, when the query is finished
/***************************************************************************************************
Create Date: 2022-11-11
Author: Harriet Craven
Description: Stored Procedure for Automating the Billing report
Used By: Richard Rance, via Tableau
Parameter(s): @from_date (DATETIME)
@to_date (DATETIME)
Usage: CALL billing_report_stored_proc('2022-06-25 00:00:00', '2022-07-25 23:59:59');
Additional notes: Date parameters are used to get runs for given timeframe (usually a financial month)
****************************************************************************************************/
-- Change delimiter to //
delimiter //
-- Create Stored Procedure
CREATE PROCEDURE billing_report_stored_proc (IN from_date DATETIME, IN to_date DATETIME)
BEGIN
-- Outer query
-- This grouping calculates the `total` amount of lanes occupied by samples in a given "group" (see end of query)
SELECT
iseq_run_lane_metrics.instrument_model AS platform
, iseq_flowcell.cost_code AS project_cost_code
, study.name AS study_name
, IF(iseq_run_lane_metrics.qc_seq = 1, 'passed', IF(iseq_run_lane_metrics.qc_seq = '0', 'failed', iseq_run_lane_metrics.qc_seq ))
AS qc_outcome
, IF(iseq_run.rp__sbs_consumable_version = '1', 'v1', IF(iseq_run.rp__sbs_consumable_version = '3', 'v1.5', iseq_run.rp__sbs_consumable_version))
AS 'v1/1.5'
, IF(iseq_run.rp__workflow_type = 'NovaSeqXp', 'XP', IF(iseq_run.rp__workflow_type = 'NovaSeqStandard', 'No XP', iseq_run.rp__workflow_type) )
AS xp
, iseq_run.rp__flow_cell_mode AS sp
, iseq_run.rp__read1_number_of_cycles AS read1
, iseq_run.rp__read2_number_of_cycles AS read2
, SUM(lanes.proportion_of_lane_per_sample) AS total
FROM
iseq_run
INNER JOIN
(
-- Inner query 1
-- There can be multiple QC complete run events,
-- this query finds all "QC complete" runs within a given timeframe.
-- Group by run ID.
-- If there are more than 1 "QC complete" events for a given run ID,
-- select only the first completed run (based on min `date`)
SELECT
id_run
, MIN(date) AS qc_complete_date
FROM
iseq_run_status
INNER JOIN
iseq_run_status_dict
ON iseq_run_status_dict.id_run_status_dict = iseq_run_status.id_run_status_dict
WHERE
iseq_run_status_dict.description = 'qc complete'
AND iseq_run_status.date >= from_date
AND iseq_run_status.date <= to_date
GROUP BY
iseq_run_status.id_run
)
AS qc_complete
ON qc_complete.id_run = iseq_run.id_run
INNER JOIN
iseq_product_metrics
ON iseq_run.id_run = iseq_product_metrics.id_run
INNER JOIN
iseq_flowcell
ON iseq_product_metrics.id_iseq_flowcell_tmp = iseq_flowcell.id_iseq_flowcell_tmp
INNER JOIN
study
ON iseq_flowcell.id_study_tmp = study.id_study_tmp
INNER JOIN
iseq_run_lane_metrics
ON iseq_product_metrics.id_run = iseq_run_lane_metrics.id_run
AND iseq_product_metrics.position = iseq_run_lane_metrics.position
INNER JOIN
(
-- Inner query 2
-- Group samples by lane ID
-- Count the number of samples (exluding controls) in a lane
-- Assuming equal distribution, calculate the proportion of lane occupied per sample (1/ number of samples)
-- Append this information to the sample, joining on lane ID
SELECT
samples.*
, format(1 / COUNT(*), 10) AS proportion_of_lane_per_sample
FROM
(
-- Inner query 3
-- Get the samples for the specific runs
-- Excluding controls
SELECT
iseq_flowcell.entity_id_lims AS lane_id
, iseq_flowcell.cost_code AS project_cost_code
, study.name
FROM
iseq_run
INNER JOIN
(
-- Inner query 4
-- (Duplication of Inner query 1)
SELECT
id_run
, MIN(date) AS qc_complete_date
FROM
iseq_run_status
INNER JOIN
iseq_run_status_dict
ON iseq_run_status_dict.id_run_status_dict = iseq_run_status.id_run_status_dict
WHERE
iseq_run_status_dict.description = 'qc complete'
AND date >= from_date
AND date <= to_date
GROUP BY
id_run
)
AS qc_complete
ON qc_complete.id_run = iseq_run.id_run
INNER JOIN
iseq_product_metrics
ON iseq_run.id_run = iseq_product_metrics.id_run
INNER JOIN
iseq_run_lane_metrics
ON iseq_product_metrics.id_run = iseq_run_lane_metrics.id_run
AND iseq_product_metrics.position = iseq_run_lane_metrics.position
INNER JOIN
iseq_flowcell
ON iseq_product_metrics.id_iseq_flowcell_tmp = iseq_flowcell.id_iseq_flowcell_tmp
INNER JOIN
study
ON iseq_flowcell.id_study_tmp = study.id_study_tmp
WHERE
study.name NOT IN ('Heron PhiX', 'Illumina Controls')
)
AS samples
GROUP BY
samples.lane_id
)
AS lanes
ON lanes.lane_id = iseq_flowcell.entity_id_lims
WHERE
study.name NOT IN ('Heron PhiX', 'Illumina Controls') -- Alternative: WHERE iseq_flowcell.cost_code IS NOT NULL
GROUP BY
study.id_study_lims
, project_cost_code
, platform
, qc_outcome
, iseq_run.rp__workflow_type
, iseq_run.rp__flow_cell_mode
;
END
//
-- Change delimiter back to ;
delimiter ;
-- Call Stored Procedure
-- October (118 without controls)
-- date >= '2022-10-01 00:00:00'
-- date <= '2022-10-24 23:59:59'
CALL billing_report_stored_proc('2022-10-01 00:00:00', '2022-10-24 23:59:59');
-- September (195 without controls)
-- date >= '2022-08-25 00:00:00'
-- date <= '2022-09-30 23:59:59'
CALL billing_report_stored_proc('2022-08-25 00:00:00', '2022-09-30 23:59:59');
-- August (155 without controls)
-- from = '2022-07-26 00:00:00'
-- to = '2022-08-24 23:59:59'
CALL billing_report_stored_proc('2022-07-26 00:00:00', '2022-08-24 23:59:59');
-- July (146 without controls)
-- from = '2022-06-25 00:00:00'
-- to = '2022-07-25 23:59:59'
CALL billing_report_stored_proc('2022-06-25 00:00:00', '2022-07-25 23:59:59');
-- Drop Stored Procedure
DROP PROCEDURE billing_report_stored_proc;
Ok, looks good; I was thinking on how to do it with a view instead but I'm thinking that I don't know how I could make it work to accept the input dates and apply them to the subqueries... which is a thing the stored procedure seems to solve without any problems.