[Question] How the number of downloads is computed?
ternaus opened this issue · comments
If I use link like https://pypistats.org/packages/albumentations numbers look around 10% higher.
I am all for filtering noise from the data, just curious what exactly is filtered.
This repo uses https://github.com/ofek/pypinfo to query BigQuery over the previous 30 days:
Line 19 in 3444113
pypinfo defaults to only downloads from the pip installer:
https://pypistats.org/about queries BigQuery directly, so I think includes all installers.
pypinfo defaults to only downloads from the pip installer:
Thank you. Just to verify, pypinfo just does not count downloads from mirrors?
It's all downloads logged by PyPI. By default, that does not include downloads from mirrors, other clients or ancient pip.
We can check with installer
:
❯ pypinfo albumentations installer
Served from cache: False
Data processed: 1.05 GiB
Data billed: 1.05 GiB
Estimated cost: $0.01
| installer_name | download_count |
| -------------- | -------------- |
| pip | 1,902,285 |
Checking with all installers:
❯ pypinfo --all albumentations installer
Served from cache: False
Data processed: 1.05 GiB
Data billed: 1.05 GiB
Estimated cost: $0.01
| installer_name | download_count |
| -------------- | -------------- |
| pip | 1,902,266 |
| uv | 149,990 |
| poetry | 31,174 |
| None | 25,125 |
| requests | 8,042 |
| bandersnatch | 2,042 |
| Nexus | 1,903 |
| pdm | 1,428 |
| Browser | 1,023 |
| Bazel | 449 |
| Total | 2,123,442 |
Got it, thanks. This is very helpful. Will update the description at https://pypilb.vercel.app/