tomjaguarpaw / haskell-opaleye

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to aggregate on selected fields or joining table fields?

stevemao opened this issue · comments

The example of aggregation in the docs works well

aggregateWidgets = aggregate (pWidget Widget { style    = groupBy
                                              , color    = groupBy
                                              , location = count
                                              , quantity = arrayAgg
                                              , radius   = avg })
                              (selectTable widgetTable)

I want to extend the example to select some fields in the widgetTable, or joining other tables and select fields on those tables too. For simplicity, I'm selecting the original 5 fields just to demonstrate the problem.

aggregateWidgets = do
    widget <- selectTable widgetTable

    aggregate (pWidget Widget { style    = groupBy
                                              , color    = groupBy
                                              , location = count
                                              , quantity = arrayAgg
                                              , radius   = avg })
                              (return widget)

This does not aggregate properly anymore. It seems that return isn't the right choice here. What's the right way to do this?

Hmm, I'm not sure what you mean. Could you give an example with sample data perhaps? That is, write out some example data for the widget table, and the result that you want from your aggregation operation?

yes, let me give more details here.

The table is the same as the example in the docs

-- Sequence and defined type
CREATE SEQUENCE IF NOT EXISTS "widgetTable_id_seq";

-- Table Definition
CREATE TABLE "public"."widgetTable" (
    "id" int4 NOT NULL DEFAULT nextval('"widgetTable_id_seq"'::regclass),
    "style" text NOT NULL,
    "color" text NOT NULL,
    "location" text NOT NULL,
    "quantity" int4 NOT NULL,
    "radius" float8 NOT NULL,
    PRIMARY KEY ("id")
);

And I added some sample data

INSERT INTO "public"."widgetTable" ("id", "style", "color", "location", "quantity", "radius") VALUES
('1', 'style', 'red', 'Syd', '2', '5'),
('2', 'style', 'red', 'Melb', '1', '10');

If I copy the aggregation code from the example

aggregateWidgets :: Select (Widget (Field SqlText) (Field SqlText) (Field SqlInt8)
                                   (Field (SqlArray SqlInt4)) (Field SqlFloat8))
aggregateWidgets = do
    aggregate (pWidget Widget { style    = groupBy
                                              , color    = groupBy
                                              , location = count
                                              , quantity = arrayAgg
                                              , radius   = avg })
                              (selectTable widgetTable)

it generates the right SQL

SELECT
"result0_2" as "result1_3",
"result1_2" as "result2_3",
"result2_2" as "result3_3",
"result3_2" as "result4_3",
"result4_2" as "result5_3"
FROM (SELECT
      *
      FROM (SELECT
            "inner0_2" as "result0_2",
            "inner1_2" as "result1_2",
            COUNT("inner2_2") as "result2_2",
            ARRAY_AGG("inner3_2") as "result3_2",
            AVG("inner4_2") as "result4_2"
            FROM (SELECT
                  "style0_1" as "inner0_2",
                  "color1_1" as "inner1_2",
                  "location2_1" as "inner2_2",
                  "quantity3_1" as "inner3_2",
                  "radius4_1" as "inner4_2",
                  *
                  FROM (SELECT
                        *
                        FROM (SELECT
                              "style" as "style0_1",
                              "color" as "color1_1",
                              "location" as "location2_1",
                              "quantity" as "quantity3_1",
                              "radius" as "radius4_1"
                              FROM "widgetTable" as "T1") as "T1") as "T1") as "T1"
            GROUP BY "inner0_2",
                     "inner1_2") as "T1") as "T1";

And it returns the expected result

style	red	2	{2,1}	7.5

However, if I want to only select 3 of the fields, or if I want to also select a field that's from another table with a join, I write something like this

aggregateWidgets :: Select (Widget (Field SqlText) (Field SqlText) (Field SqlInt8)
                                   (Field (SqlArray SqlInt4)) (Field SqlFloat8))
aggregateWidgets = do
    widget <- selectTable widgetTable
    aggregate (pWidget Widget { style    = groupBy
                                              , color    = groupBy
                                              , location = count
                                              , quantity = arrayAgg
                                              , radius   = avg })
                              (return (Widget (style widget) (color widget) (location widget) (quantity widget) (radius widget)))

Note I'm still selecting the same fields just for simplicity. I would create another Widget type that excludes the fields I don't want or include the fields from other tables.

This doesn't generate the right SQL

SELECT
"result0_2" as "result1_3",
"result1_2" as "result2_3",
"result2_2" as "result3_3",
"result3_2" as "result4_3",
"result4_2" as "result5_3"
FROM (SELECT
      *
      FROM (SELECT
            "style" as "style0_1",
            "color" as "color1_1",
            "location" as "location2_1",
            "quantity" as "quantity3_1",
            "radius" as "radius4_1"
            FROM "widgetTable" as "T1") as "T1",
           LATERAL
           (SELECT
            "inner0_2" as "result0_2",
            "inner1_2" as "result1_2",
            COUNT("inner2_2") as "result2_2",
            ARRAY_AGG("inner3_2") as "result3_2",
            AVG("inner4_2") as "result4_2"
            FROM (SELECT
                  "style0_1" as "inner0_2",
                  "color1_1" as "inner1_2",
                  "location2_1" as "inner2_2",
                  "quantity3_1" as "inner3_2",
                  "radius4_1" as "inner4_2",
                  *
                  FROM (SELECT
                        0) as "T1") as "T1"
            GROUP BY "inner0_2",
                     "inner1_2") as "T2") as "T1";

And returns wrong result

style	red	1	{2}	5
style	red	1	{1}	10

What is the right way to achieve this?

I'm still not entirely sure what you're trying to do, but do you want to project the columns after the aggregation has happened?

> aggregateWidgetsFewer :: Select (Field SqlText, Field SqlText, Field SqlInt8)
> aggregateWidgetsFewer = do
> w <- aggregate (pWidget Widget { style = groupBy
> , color = groupBy
> , location = count
> , quantity = sum
> , radius = avg })
> (selectTable widgetTable)
>
> return (style w, color w, location w)

You can also project the columns before if you like, but note, the select from the table has to be inside the aggregation.

> aggregateWidgetsFewerBefore :: Select (Field SqlText, Field SqlText, Field SqlInt8)
> aggregateWidgetsFewerBefore = do
> aggregate (p3 (groupBy, groupBy, count))
> (do w <- selectTable widgetTable
> return (style w, color w, location w))

One rule of thumb to get to the correct solution is that the Select monad behaves like the list monad. That might help you understand where to place selects for the desired outcome.

Thanks! I think the example of "project the columns before" would work. Let me try it tomorrow.

Hi @tomjaguarpaw , it seems to work with fields from other tables too

I slightly modified the example.

personAndBirthday ::
   Select (Field (SqlArray SqlText), Field SqlInt4, Field (SqlArray SqlText), Field SqlDate)
personAndBirthday = do
   aggregate (p4 (arrayAgg, groupBy, arrayAgg, groupBy))
            $ do
               (name, age, address) <- personSelect
               birthday             <- birthdaySelect

               where_ $ name .== bdName birthday
               pure (name, age, address, bdDay birthday)

This works and I think it's worth to add it to the docs.

the select from the table has to be inside the aggregation

I missed the inner Select completely.

One rule of thumb to get to the correct solution is that the Select monad behaves like the list monad.

I still don't understand what it means to have an inner select. Perhaps we can compare 2 examples

Example 1. This is working as expected

personAndBirthday ::
   Select (Field (SqlArray SqlText), Field SqlInt4, Field (SqlArray SqlText), Field SqlDate)
personAndBirthday = do
   aggregate (p4 (arrayAgg, groupBy, arrayAgg, groupBy))
            $ do
               (name, age, address) <- personSelect
               birthday             <- birthdaySelect

               where_ $ name .== bdName birthday
               pure (name, age, address, bdDay birthday)

Example 2. I cannot see a use case. But what does it mean when we move the select out side of the aggregation?

personAndBirthday ::
   Select (Field (SqlArray SqlText), Field SqlInt4, Field (SqlArray SqlText), Field SqlDate)
personAndBirthday = do
   (name, age, address) <- personSelect
   birthday             <- birthdaySelect

   where_ $ name .== bdName birthday
   
   aggregate (p4 (arrayAgg, groupBy, arrayAgg, groupBy))
            $ do
               pure (name, age, address, bdDay birthday)

I think this is how I understand it. Monad carries the context. When we do aggregate, we need to carry the context (where_ and all the Selects) to it so that it can generate the correct aggregation script.

It's simpler than that. Think of Select as the list monad. Then you will have the intuition you need to understand where the select needs to go. For example

{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE MultiParamTypeClasses #-}

module Example where

import Opaleye

import Control.Monad (guard)
import Database.Postgres.Temp as Temp
import Database.PostgreSQL.Simple as PGS

listSum :: [Int] -> [Int]
listSum = pure . Prelude.sum

listInner :: [Int]
listInner = do
  x <- [1, 2, 3, 4, 5]
  guard (x <= 3)
  listSum (pure x)

listOuter :: [Int]
listOuter = do
  listSum $ do
    x <- [1, 2, 3, 4, 5]
    guard (x <= 3)
    pure x

opaleyeInner :: Select (Field SqlInt4)
opaleyeInner = do
  x <- values [1, 2, 3, 4, 5]
  where_ (x .<= 3)
  aggregate Opaleye.sum (pure x)

opaleyeOuter :: Select (Field SqlInt4)
opaleyeOuter = do
  aggregate Opaleye.sum $ do
    x <- values [1, 2, 3, 4, 5]
    where_ (x .<= 3)
    pure x

runExamples :: IO ()
runExamples = do
  _ <- Temp.with $ \db -> do
    conn <- PGS.connectPostgreSQL (Temp.toConnectionString db)
    putStrLn $ "listInner: " ++ show listInner
    putStrLn $ "listOuter: " ++ show listOuter
    putStr "opaleyeInner: "
    print =<< runSelectI conn opaleyeInner
    putStr "opaleyeInner: "
    print =<< runSelectI conn opaleyeOuter

  pure ()
> runExamples 
listInner: [1,2,3]
listOuter: [6]
opaleyeInner: [1,2,3]
opaleyeInner: [6]

And to put it a bit more directly, pure creates a single row, just like pure in the list monad creates a singleton list. If you do the select outside the aggregate and you have just pure inside the aggregate then you are aggregating over a single row, which is probably not what you want!

Thanks for the detailed explanation!