pypinfo now uses more quota, no more 365-day data?
hugovk opened this issue · comments
pypinfo now uses an updated BigQuery table to get download numbers which is more accurate, and uses less quota for most queries, but it's gone up for some.
For example:
$ pypinfo --days 365 "" project
Served from cache: False
- Data processed: 87.84 GiB
+ Data processed: 1.69 TiB
- Data billed: 87.84 GiB
+ Data billed: 1.69 TiB
- Estimated cost: $0.43
+ Estimated cost: $8.45
The 1st April cron successfully fetched the 30-day data:
{
"last_update": "2021-05-01 14:30:19",
"query": {
"bytes_billed": 224987709440,
"bytes_processed": 224987499284,
"cached": false,
"estimated_cost": "1.03"
},
...
That's ~225 GB.
This is up from "bytes_billed": 50120884224
(~50 GB) on 1st April (x4.5 bigger).
But failed on the 365-day:
...
File "/usr/local/lib/python3.6/dist-packages/google/cloud/_http.py", line 293, in api_request
raise exceptions.from_http_response(response)
google.api_core.exceptions.Forbidden: 403 GET https://www.googleapis.com/bigquery/v2/projects/top-pypi-packages/queries/...?maxResults=0&timeoutMs=10000: Quota exceeded: Your project exceeded quota for free query bytes scanned. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors
On 1st April the 365 was "bytes_billed": 951669751808
(~951 GB), so x4.5 = ~4.28 TB!
The free monthly quota is 1 TB.
-
1 April was ~50 GB + ~951 GB, must have come in just under the 1 TB limit.
-
1 May was an ~225 GB + estimated 4.28 TB...
Option 1: Rough calculation: there's quota to get 365 data for 724 packages. So rounding down, perhaps it will work for say, 500 or 100 packages? Would that still be useful?
Option 2: Alternatively, could ditch the 365 data altogether, and perhaps bump 30-day data from 4,000 back up to say 5,000.
Feedback welcome!
In the meantime, I've pushed the 30-day data.