tomjaguarpaw / haskell-opaleye

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to fold a query into an SqlArray if it contains multiple fields?

abigailalice opened this issue · comments

I'm trying to return some data from my database representing a one-to-many relation, and am trying to do it in a single query, but don't see how to accomplish it. There is arrayAgg :: Aggregator (Field a) (Field (SqlArray a)), but the type seems to restrict me from returning more than a single field in each element of the array. In Postgresql you can do things like select array_agg((x,y)) from sometable;, so I don't think this should be a limitation, but I'm not sure if the more first-class-ish tuples that can be put into arrays are actually supported. Unless there's some combinator that can do things like Field_ n a -> Field_ m b -> Field (Field_ n a, Field_ m b) I'm not sure how I might do this.

For the moment I just have a for loop with a separate query for every element retrieved from the first table, but this is obviously not ideal if there's a way around it. Ultimately I want the returned type from the query to be something like [(a, [b])].

Opaleye doesn't support SQL-level tuples (anonymous records) yet, so you can't do quite that, but you can aggregate into two separate arrays and then zip them on the Haskell side. That is, return something like [(a, [b], [c])] and then zip the bs and cs to get a [(b, c)]. Does that make sense?

That seems simple enough, though would the type of arrayAgg restrict that from working with nullable fields? I think I might be able to work around that if it's a limitation, but I'm curious if there's a way to do that.

Hmm, yes, but that seems like a bug in arrayAgg. Can you try

arrayAgg_ :: Aggregator (F.Field_ n a) (F.Field (T.SqlArray_ n a))
arrayAgg_ = Opaleye.Internal.Aggregate.makeAggr HPQ.AggrArr

and let me know if that works OK? If so I can add it to Opaleye.

Should I need a DefaultFromFields (SqlArray_ 'Nullable sqlType) [Maybe haskellType] instance? Trying to test it in my own code said it was a missing instance, and I do see there's a Default (Inferrable FromField) (SqlArray_ Nullable sqlType) [Maybe hsType] instance that looks pretty close, but I don't quite get how they relate to each other.

Thanks for reporting all of these omissions regarding nullable fields. It looks like I made a lot of oversights when converting from Column to Field!

You are correct that the instance has been incorrectly omitted. Please try adding it back locally with

instance (Typeable b, DefaultFromField a b) =>
         DefaultFromField (T.SqlArray_ Nullable a) [Maybe b] where
  defaultFromField = fromFieldArrayNullable defaultFromField

I will add this instance in a bugfix release too.

The instance issue should be fixed in https://hackage.haskell.org/package/opaleye-0.9.4.0. Let me know if this doesn't solve the problem for you.

So I tried using arrayAgg_ as you suggested and it looks like things are working, though my use of it wasn't super extensive. The workaround with the zipping instead of a runSelect inside of a loop also seemed to work well; I assume the DBMS would rewrite the shared part of the aggregations into a single aggregation, but regardless it sped things from a couple of seconds to close to instantaneous.

Nice! I should add arrayAgg_ to Opaleye then.

So if this can be accomplished with the product-profunctors interface instead of lenses it might be worthwhile adding an example to the documentation for arrayAgg_ mentioning the zipping idiom, but even if not I thought mentioning it might be helpful if anyone ends up with the same issue. But I ended up needing to do a large number of such aggregations, and found the easiest way to accomplish them was doing something like

aggregateOf
    :: ((Select s -> Select (Field (SqlArray_ n a))) -> w -> m t
    -> Lens.Getting (Field_ n a) s (Field_ n a)
    -> w
    -> m t
aggregateOf l l' = l (aggregate arrayAgg_ . fmap (Lens.view l'))

...
unzippedRows <- pure SomeRow
    { someColumn1 = selectSomeRow
    , someColumn2 = selectSomeRow
    }
    >>= aggregateOf someColumn1Lens someColumn1Lens
    >>= aggregateOf someColumn2Lens someColumn2Lens

where selectSomeRow :: Select (SomeRow (Field SqlCol1Field) (Field SqlCol2Field)). The basic idea is that the SomeRow datatype contains the select statement to retrieve itself in each field, and aggregateOf projects out the individual field, aggregates it, and then traverses out the select, so it now contains the list of that row.

It still makes me a touch nervous to trust postgresql to share the work duplicated in each aggregation, but I don't think there's any way to avoid that with the zipping idiom regardless of whether it's abstracted like this or not. I don't know if that concern is really warranted given how thoroughly DBMSes are optimized.