hugovk / top-pypi-packages

A regular dump of the most-downloaded packages from PyPI

Home Page:https://hugovk.github.io/top-pypi-packages

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Question] How the number of downloads is computed?

ternaus opened this issue · comments

If I use link like https://pypistats.org/packages/albumentations numbers look around 10% higher.

I am all for filtering noise from the data, just curious what exactly is filtered.

This repo uses https://github.com/ofek/pypinfo to query BigQuery over the previous 30 days:

/home/botuser/.local/bin/pypinfo --json --indent 0 --limit 8000 --days 30 "" project > top-pypi-packages-30-days.json

pypinfo defaults to only downloads from the pip installer:

ofek/pypinfo#46 (comment)

https://pypistats.org/about queries BigQuery directly, so I think includes all installers.

pypinfo defaults to only downloads from the pip installer:

Thank you. Just to verify, pypinfo just does not count downloads from mirrors?

It's all downloads logged by PyPI. By default, that does not include downloads from mirrors, other clients or ancient pip.

We can check with installer:

pypinfo albumentations installer
Served from cache: False
Data processed: 1.05 GiB
Data billed: 1.05 GiB
Estimated cost: $0.01

| installer_name | download_count |
| -------------- | -------------- |
| pip            |      1,902,285 |

Checking with all installers:

pypinfo --all albumentations installer
Served from cache: False
Data processed: 1.05 GiB
Data billed: 1.05 GiB
Estimated cost: $0.01

| installer_name | download_count |
| -------------- | -------------- |
| pip            |      1,902,266 |
| uv             |        149,990 |
| poetry         |         31,174 |
| None           |         25,125 |
| requests       |          8,042 |
| bandersnatch   |          2,042 |
| Nexus          |          1,903 |
| pdm            |          1,428 |
| Browser        |          1,023 |
| Bazel          |            449 |
| Total          |      2,123,442 |

Got it, thanks. This is very helpful. Will update the description at https://pypilb.vercel.app/