18F / analytics-reporter

Lightweight analytics reporting and publishing tool for Digital Analytics Program's Google Analytics 360 data.

Home Page:https://analytics.usa.gov/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sessions Today GA4 data is not returning SUM as expected

scottqueen-bixal opened this issue · comments

Background

We have noticed that the values returned from requests that include metric: sessions and data range today -> today to not have the same SUM values that consumed from the same reqeusts in the UA client.

User Story

As a visitor to https://analytics.usa.gov/, I should find a bar graph populated with session data

Acceptance Criteria

  • report returns sessions ~MIL values per hour
  • report provides value objects that includes current day -> up to current hour -> through 11am (0-23)
Sampling level is automatic today, top-pages-7-days We can’t adjust precision in sampling, but we still get a metadata response when sampling has been performed.In some cases sampling is done while processing is still in progress, so !Golden.  When this occurs GA4 buckets values into the “(other)” key.  For reports like today, with date rage “today” -> “today” this makes a significant impact.  By updating the report date range to start from “yesterday”, we get a much more accurate SUM value, but need to handle some data clean-up on FE rendering.

We moved to using last-48-hours.json report to provide values for the sessions bar chart.

https://analytics-develop.app.cloud.gov/data/live/last-48-hours.json

this includes values yesterday, but is run during realtime.sh so we get updated values every ~15min

This data is inconsistent,

At some point during the 24hour cycle that the cron runs, our last-48-hours report also reverts to an automatic sampling value similar to the today report. You can see the break in values clearly on the chart, these two high bars are the (other) bucket.

~9:30am

Image

it eventually returns, today it normalized around ~10:30 am est.

Image

some additional conversation on this issue here https://gsa-tts.slack.com/archives/C05S1B327MH/p1705595516913689?thread_ts=1705595482.488289&cid=C05S1B327MH

this user puts it well, https://stackoverflow.com/a/76550342, and recommends avoiding the use of 'today' for report request