How to aggregate on selected fields or joining table fields?
stevemao opened this issue · comments
The example of aggregation in the docs works well
aggregateWidgets = aggregate (pWidget Widget { style = groupBy
, color = groupBy
, location = count
, quantity = arrayAgg
, radius = avg })
(selectTable widgetTable)
I want to extend the example to select some fields in the widgetTable, or joining other tables and select fields on those tables too. For simplicity, I'm selecting the original 5 fields just to demonstrate the problem.
aggregateWidgets = do
widget <- selectTable widgetTable
aggregate (pWidget Widget { style = groupBy
, color = groupBy
, location = count
, quantity = arrayAgg
, radius = avg })
(return widget)
This does not aggregate properly anymore. It seems that return
isn't the right choice here. What's the right way to do this?
Hmm, I'm not sure what you mean. Could you give an example with sample data perhaps? That is, write out some example data for the widget table, and the result that you want from your aggregation operation?
yes, let me give more details here.
The table is the same as the example in the docs
-- Sequence and defined type
CREATE SEQUENCE IF NOT EXISTS "widgetTable_id_seq";
-- Table Definition
CREATE TABLE "public"."widgetTable" (
"id" int4 NOT NULL DEFAULT nextval('"widgetTable_id_seq"'::regclass),
"style" text NOT NULL,
"color" text NOT NULL,
"location" text NOT NULL,
"quantity" int4 NOT NULL,
"radius" float8 NOT NULL,
PRIMARY KEY ("id")
);
And I added some sample data
INSERT INTO "public"."widgetTable" ("id", "style", "color", "location", "quantity", "radius") VALUES
('1', 'style', 'red', 'Syd', '2', '5'),
('2', 'style', 'red', 'Melb', '1', '10');
If I copy the aggregation code from the example
aggregateWidgets :: Select (Widget (Field SqlText) (Field SqlText) (Field SqlInt8)
(Field (SqlArray SqlInt4)) (Field SqlFloat8))
aggregateWidgets = do
aggregate (pWidget Widget { style = groupBy
, color = groupBy
, location = count
, quantity = arrayAgg
, radius = avg })
(selectTable widgetTable)
it generates the right SQL
SELECT
"result0_2" as "result1_3",
"result1_2" as "result2_3",
"result2_2" as "result3_3",
"result3_2" as "result4_3",
"result4_2" as "result5_3"
FROM (SELECT
*
FROM (SELECT
"inner0_2" as "result0_2",
"inner1_2" as "result1_2",
COUNT("inner2_2") as "result2_2",
ARRAY_AGG("inner3_2") as "result3_2",
AVG("inner4_2") as "result4_2"
FROM (SELECT
"style0_1" as "inner0_2",
"color1_1" as "inner1_2",
"location2_1" as "inner2_2",
"quantity3_1" as "inner3_2",
"radius4_1" as "inner4_2",
*
FROM (SELECT
*
FROM (SELECT
"style" as "style0_1",
"color" as "color1_1",
"location" as "location2_1",
"quantity" as "quantity3_1",
"radius" as "radius4_1"
FROM "widgetTable" as "T1") as "T1") as "T1") as "T1"
GROUP BY "inner0_2",
"inner1_2") as "T1") as "T1";
And it returns the expected result
style red 2 {2,1} 7.5
However, if I want to only select 3 of the fields, or if I want to also select a field that's from another table with a join, I write something like this
aggregateWidgets :: Select (Widget (Field SqlText) (Field SqlText) (Field SqlInt8)
(Field (SqlArray SqlInt4)) (Field SqlFloat8))
aggregateWidgets = do
widget <- selectTable widgetTable
aggregate (pWidget Widget { style = groupBy
, color = groupBy
, location = count
, quantity = arrayAgg
, radius = avg })
(return (Widget (style widget) (color widget) (location widget) (quantity widget) (radius widget)))
Note I'm still selecting the same fields just for simplicity. I would create another Widget
type that excludes the fields I don't want or include the fields from other tables.
This doesn't generate the right SQL
SELECT
"result0_2" as "result1_3",
"result1_2" as "result2_3",
"result2_2" as "result3_3",
"result3_2" as "result4_3",
"result4_2" as "result5_3"
FROM (SELECT
*
FROM (SELECT
"style" as "style0_1",
"color" as "color1_1",
"location" as "location2_1",
"quantity" as "quantity3_1",
"radius" as "radius4_1"
FROM "widgetTable" as "T1") as "T1",
LATERAL
(SELECT
"inner0_2" as "result0_2",
"inner1_2" as "result1_2",
COUNT("inner2_2") as "result2_2",
ARRAY_AGG("inner3_2") as "result3_2",
AVG("inner4_2") as "result4_2"
FROM (SELECT
"style0_1" as "inner0_2",
"color1_1" as "inner1_2",
"location2_1" as "inner2_2",
"quantity3_1" as "inner3_2",
"radius4_1" as "inner4_2",
*
FROM (SELECT
0) as "T1") as "T1"
GROUP BY "inner0_2",
"inner1_2") as "T2") as "T1";
And returns wrong result
style red 1 {2} 5
style red 1 {1} 10
What is the right way to achieve this?
I'm still not entirely sure what you're trying to do, but do you want to project the columns after the aggregation has happened?
haskell-opaleye/Doc/Tutorial/TutorialBasic.lhs
Lines 603 to 612 in 5fc848b
You can also project the columns before if you like, but note, the select
from the table has to be inside the aggregation.
haskell-opaleye/Doc/Tutorial/TutorialBasic.lhs
Lines 614 to 618 in 5fc848b
One rule of thumb to get to the correct solution is that the Select
monad behaves like the list monad. That might help you understand where to place select
s for the desired outcome.
Thanks! I think the example of "project the columns before" would work. Let me try it tomorrow.
Hi @tomjaguarpaw , it seems to work with fields from other tables too
I slightly modified the example.
personAndBirthday ::
Select (Field (SqlArray SqlText), Field SqlInt4, Field (SqlArray SqlText), Field SqlDate)
personAndBirthday = do
aggregate (p4 (arrayAgg, groupBy, arrayAgg, groupBy))
$ do
(name, age, address) <- personSelect
birthday <- birthdaySelect
where_ $ name .== bdName birthday
pure (name, age, address, bdDay birthday)
This works and I think it's worth to add it to the docs.
the select from the table has to be inside the aggregation
I missed the inner Select completely.
One rule of thumb to get to the correct solution is that the Select monad behaves like the list monad.
I still don't understand what it means to have an inner select. Perhaps we can compare 2 examples
Example 1. This is working as expected
personAndBirthday ::
Select (Field (SqlArray SqlText), Field SqlInt4, Field (SqlArray SqlText), Field SqlDate)
personAndBirthday = do
aggregate (p4 (arrayAgg, groupBy, arrayAgg, groupBy))
$ do
(name, age, address) <- personSelect
birthday <- birthdaySelect
where_ $ name .== bdName birthday
pure (name, age, address, bdDay birthday)
Example 2. I cannot see a use case. But what does it mean when we move the select out side of the aggregation?
personAndBirthday ::
Select (Field (SqlArray SqlText), Field SqlInt4, Field (SqlArray SqlText), Field SqlDate)
personAndBirthday = do
(name, age, address) <- personSelect
birthday <- birthdaySelect
where_ $ name .== bdName birthday
aggregate (p4 (arrayAgg, groupBy, arrayAgg, groupBy))
$ do
pure (name, age, address, bdDay birthday)
I think this is how I understand it. Monad carries the context. When we do aggregate, we need to carry the context (where_
and all the Select
s) to it so that it can generate the correct aggregation script.
It's simpler than that. Think of Select
as the list monad. Then you will have the intuition you need to understand where the select
needs to go. For example
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE MultiParamTypeClasses #-}
module Example where
import Opaleye
import Control.Monad (guard)
import Database.Postgres.Temp as Temp
import Database.PostgreSQL.Simple as PGS
listSum :: [Int] -> [Int]
listSum = pure . Prelude.sum
listInner :: [Int]
listInner = do
x <- [1, 2, 3, 4, 5]
guard (x <= 3)
listSum (pure x)
listOuter :: [Int]
listOuter = do
listSum $ do
x <- [1, 2, 3, 4, 5]
guard (x <= 3)
pure x
opaleyeInner :: Select (Field SqlInt4)
opaleyeInner = do
x <- values [1, 2, 3, 4, 5]
where_ (x .<= 3)
aggregate Opaleye.sum (pure x)
opaleyeOuter :: Select (Field SqlInt4)
opaleyeOuter = do
aggregate Opaleye.sum $ do
x <- values [1, 2, 3, 4, 5]
where_ (x .<= 3)
pure x
runExamples :: IO ()
runExamples = do
_ <- Temp.with $ \db -> do
conn <- PGS.connectPostgreSQL (Temp.toConnectionString db)
putStrLn $ "listInner: " ++ show listInner
putStrLn $ "listOuter: " ++ show listOuter
putStr "opaleyeInner: "
print =<< runSelectI conn opaleyeInner
putStr "opaleyeInner: "
print =<< runSelectI conn opaleyeOuter
pure ()
> runExamples
listInner: [1,2,3]
listOuter: [6]
opaleyeInner: [1,2,3]
opaleyeInner: [6]
And to put it a bit more directly, pure
creates a single row, just like pure
in the list monad creates a singleton list. If you do the select
outside the aggregate and you have just pure
inside the aggregate then you are aggregating over a single row, which is probably not what you want!
Thanks for the detailed explanation!