DataJunction / dj

A metrics platform.

Home Page:http://datajunction.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Audit ORM-generated queries

shangyian opened this issue · comments

commented

We should audit the queries produced by the ORM as they aren't always the most efficient way to query the database. The following are a list of areas where we aren't using ORM object loading in an efficient way:

  • get_dimensions: In the get_dimensions function, we make numerous calls to the database, at least one for each iteration of the for loop, when the entire function could likely be collapsed into a single recursive CTE:
WITH RECURSIVE dimensions_graph AS (
    SELECT
        initial_node.id AS path_start,
        c.name AS col_name,
        dimension_node.id AS path_end,
        initial_node.name || '.' || c.name || ',' || dimension_node.name AS join_path,
        dimension_node.name AS node_name,
        dimension_rev.id AS node_revision_id,
        dimension_rev.display_name AS node_display_name
    FROM noderevision initial_node
    JOIN nodecolumns nc ON initial_node.id = nc.node_id
    JOIN column c ON nc.column_id = c.id
    JOIN node dimension_node ON c.dimension_id = dimension_node.id
    JOIN noderevision dimension_rev 
    ON dimension_rev.version = dimension_node.current_version 
        AND dimension_rev.node_id = dimension_node.id
    WHERE initial_node.id=77

    UNION ALL

    SELECT
        ng.path_start,
        c.name AS col_name,
        next_node.id AS path_end,
        ng.join_path || '.' || c.name || ',' || next_node.name AS join_path,
        next_node.name AS node_name,
        next_rev.id AS node_revision_id,
        next_rev.display_name AS node_display_name
    FROM dimensions_graph ng
    JOIN node current_node ON ng.path_end = current_node.id
    JOIN noderevision current_rev ON current_rev.version = current_node.current_version AND current_rev.node_id = current_node.id
    JOIN nodecolumns current_columns ON current_columns.node_id = current_rev.id
    JOIN column c ON current_columns.column_id = c.id
    JOIN node next_node ON c.dimension_id = next_node.id
    JOIN noderevision next_rev ON next_rev.version = next_node.current_version AND next_rev.node_id=next_node.id
),
dims AS (
    SELECT * from dimensions_graph
)
SELECT 
    dims.node_name, 
    dims.node_revision_id, 
    dims.node_display_name,
    c.name,
    dims.join_path
FROM dims 
JOIN nodecolumns nc ON nc.node_id = dims.node_revision_id
JOIN column c ON nc.column_id = c.id;

TBD