group_by_duration doesn't work for a ActiveRecord::Relation

Question

group_by_duration doesn't work for a ActiveRecord::Relation

jeffblake opened this issue 4 years ago · comments

Thanks for your work on group_by_duration. I was the first commenter 6 years ago on the feature request :) #23

I spun up the branch real quick, and it appears the method signatures don't match up when going through an ActiveRecord::Relation. i.e., enumerable.rb vs query_methods.rb

User.limit(3).group_by_duration(10.minutes) ----> Arel::Visitors::UnsupportedVisitError (Unsupported argument type: Hash. Construct an Arel node instead.)

User.limit(3).group_by_duration(10.minutes, :created_at) ----> ArgumentError (wrong number of arguments (given 2, expected 1))

Thanks!

I originally took a stab at this in SQL, and came to appreciate very much your work on building complete series, integration with active record relations, and time zones!

    query = <<-SQL.squish
      SELECT COUNT(*) count,
      to_timestamp(floor((extract('epoch' from scanned_at) / 600 )) * 600)
      AT TIME ZONE 'UTC' as interval_alias
      FROM scans WHERE event_id = $1 AND action IN (#{bind_param.join(',')}) GROUP BY interval_alias
    SQL

    binds = [
      ActiveRecord::Relation::QueryAttribute.new("event_id", event_id, ActiveRecord::Type::Integer.new)

    ]

    ActiveRecord::Base.connection.exec_query(query, 'SQL', binds, prepare: true).to_a

Andrew Kane · Answer 1 · Wed Feb 19 2020 02:08:54 GMT+0800 (China Standard Time)

Hey @jeffblake, thanks for the report. Just pushed a fix to that branch.

Jeff Blake · Answer 2 · Wed Feb 19 2020 04:36:01 GMT+0800 (China Standard Time)

It's fixed, thank you. Would love to see this in v5.

Some minor things I noticed

If passing in a format, e.g., "%l:%M%P", with data than spans multiple days (i.e. key_format is not unique), the count of the last appearance of that time, say 4:10pm, would overwrite the previous occurrences. This could be expected behavior, but it wasn't clear initially
Is it possible to strip out outliers of the series? I think that would be a useful option to pass in

I have some other performance ideas that I may take a stab at

add # frozen_string_literal: true comments to save a few allocations
prefer Time.iso8601 instead of Time.parse for performance
key_format in series_builder is significantly slower when passing in a format. line 215 time_zone.parse("2014-03-02 00:00:00") does this every time
I only took a quick peek, but I think there are more opportunities to shave down allocations.

Andrew Kane · Answer 3 · Wed Feb 19 2020 04:57:13 GMT+0800 (China Standard Time)

Hey @jeffblake, thanks for the feedback/ideas.

Since aggregations return a hash, non-unique keys will be overwritten. I don't think there's a way around that.
For outliers, check out Anomaly or Trend.
For performance, we can memoize key_format speed things up (nice find!). I'm not sure the other optimizations will make a big difference.

Jeff Blake · Answer 4 · Wed Feb 19 2020 05:20:22 GMT+0800 (China Standard Time)

Sounds good, the memoize will be the best bang for buck.

Thanks for the tips on the outliers, and yes, makes sense about the aggregation

I'll go ahead and close this, appreciate the quick fix

Andrew Kane · Answer 5 · Fri Jul 24 2020 06:34:50 GMT+0800 (China Standard Time)

Just fyi, decided to go a different direction to group by 10 minute intervals. See #23 (comment).