group_by_duration doesn't work for a ActiveRecord::Relation
jeffblake opened this issue · comments
Hi @ankane ,
Thanks for your work on group_by_duration
. I was the first commenter 6 years ago on the feature request :) #23
I spun up the branch real quick, and it appears the method signatures don't match up when going through an ActiveRecord::Relation
. i.e., enumerable.rb
vs query_methods.rb
User.limit(3).group_by_duration(10.minutes)
----> Arel::Visitors::UnsupportedVisitError (Unsupported argument type: Hash. Construct an Arel node instead.)
User.limit(3).group_by_duration(10.minutes, :created_at)
----> ArgumentError (wrong number of arguments (given 2, expected 1))
Thanks!
I originally took a stab at this in SQL, and came to appreciate very much your work on building complete series, integration with active record relations, and time zones!
query = <<-SQL.squish
SELECT COUNT(*) count,
to_timestamp(floor((extract('epoch' from scanned_at) / 600 )) * 600)
AT TIME ZONE 'UTC' as interval_alias
FROM scans WHERE event_id = $1 AND action IN (#{bind_param.join(',')}) GROUP BY interval_alias
SQL
binds = [
ActiveRecord::Relation::QueryAttribute.new("event_id", event_id, ActiveRecord::Type::Integer.new)
]
ActiveRecord::Base.connection.exec_query(query, 'SQL', binds, prepare: true).to_a
Hey @jeffblake, thanks for the report. Just pushed a fix to that branch.
It's fixed, thank you. Would love to see this in v5.
Some minor things I noticed
- If passing in a
format
, e.g., "%l:%M%P", with data than spans multiple days (i.e.key_format
is not unique), the count of the last appearance of that time, say 4:10pm, would overwrite the previous occurrences. This could be expected behavior, but it wasn't clear initially - Is it possible to strip out outliers of the series? I think that would be a useful option to pass in
I have some other performance ideas that I may take a stab at
- add
# frozen_string_literal: true
comments to save a few allocations - prefer
Time.iso8601
instead ofTime.parse
for performance key_format
in series_builder is significantly slower when passing in aformat
. line 215time_zone.parse("2014-03-02 00:00:00")
does this every time
I only took a quick peek, but I think there are more opportunities to shave down allocations.
Hey @jeffblake, thanks for the feedback/ideas.
Sounds good, the memoize will be the best bang for buck.
Thanks for the tips on the outliers, and yes, makes sense about the aggregation
I'll go ahead and close this, appreciate the quick fix
Just fyi, decided to go a different direction to group by 10 minute intervals. See #23 (comment).