Add June 2022 data to the almanac dataset
rviscomi opened this issue · comments
The 2022_06_01
crawl is complete and we have the raw [pages, requests, response bodies, lighthouse] tables available on BigQuery. To make previous years' queries compatible with the June crawl, we need to add it to the following tables in the almanac
dataset:
-
almanac.requests
-
almanac.summary_response_bodies
- desktop
- mobile
-
almanac.parsed_css
and inline CSS- desktop inline
- mobile inline
- desktop external
- mobile external
-
almanac.third_parties
-
almanac.green_web_foundation
These are no longer required (we moved to custom metrics last year but ran them one last time to allow comparison):
To clarify, should we still generate the 2022 data for these tables for comparison, or is it not needed because the data is already in custom metrics?
Don’t plan on doing the comparison this year (it was very messy last year as so different so wouldn’t advise anyone else to do it either!) so not needed this year AFAIK. Let’s make a clean break.
Sounds good, thanks. Removed them from the checklist.
@patrickhulce the latest table in lighthouse-infrastructure.third_party_web
I see is dated 2022_01_01
. Is it possible to get an updated version based on 2022_06_01
HTTP Archive data?
We're holding off on this because we discovered a data quality issue with the crawl
Resuming this now that the June crawl has been successfully rerun
Ping @patrickhulce about getting an updated lighthouse-infrastructure.third_party_web.2022_06_01
table.