hugovk / top-pypi-packages

A regular dump of the most-downloaded packages from PyPI

Home Page:https://hugovk.github.io/top-pypi-packages

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Latest statistics seems way off

mayeut opened this issue · comments

I might be missing something but running pypinfo does not get me the same download stats.
Might have something to do with pypinfo (either a bug in most recent one or probably old one ?)

Successfully installed pypinfo-19.0.0
MacBook-Pro-de-Matthieu:cibuildwheel Matt$ pypinfo --limit 3000 --days 30 "" project
Served from cache: False
Data processed: 218.16 GiB
Data billed: 218.16 GiB
Estimated cost: $1.07

| project                                    | download_count |
| ------------------------------------------ | -------------- |
| urllib3                                    |    155,783,183 |
| boto3                                      |    131,751,857 |
| setuptools                                 |    125,885,991 |
| six                                        |    124,285,993 |
| botocore                                   |    120,153,066 |
| requests                                   |    110,315,156 |
| idna                                       |    105,516,184 |
| certifi                                    |    104,960,324 |
| python-dateutil                            |     99,333,882 |
| chardet                                    |     99,208,776 |
| pyyaml                                     |     97,292,775 |
| s3transfer                                 |     90,342,179 |
| pip                                        |     79,046,819 |
| wheel                                      |     77,138,656 |
| rsa                                        |     74,963,708 |
| jmespath                                   |     72,535,669 |
.......
Successfully installed pypinfo-17.0.0
(.venv) MacBook-Pro-de-Matthieu:cibuildwheel Matt$ pypinfo --limit 30 --days 30 "" project
Served from cache: False
Data processed: 42.28 GiB
Data billed: 42.28 GiB
Estimated cost: $0.21

| project           | download_count |
| ----------------- | -------------- |
| urllib3           |     45,926,529 |
| setuptools        |     38,156,728 |
| boto3             |     37,597,046 |
| six               |     37,048,711 |
| botocore          |     34,455,306 |
| requests          |     32,516,057 |
| idna              |     32,016,943 |
| certifi           |     31,289,360 |
| chardet           |     29,527,031 |
| python-dateutil   |     28,628,600 |
| pyyaml            |     27,598,555 |
| s3transfer        |     25,867,805 |
| wheel             |     23,655,631 |
| pip               |     23,143,362 |
| rsa               |     21,621,049 |
| jmespath          |     20,666,007 |
| cffi              |     20,528,153 |
| pyasn1            |     20,168,386 |
| numpy             |     19,663,676 |
| jinja2            |     18,489,425 |
| markupsafe        |     18,024,126 |
| awscli            |     17,297,647 |
| pytz              |     16,884,772 |
| docutils          |     16,570,589 |
| protobuf          |     16,428,711 |
| pycparser         |     15,867,939 |
| colorama          |     15,386,740 |
| cryptography      |     15,171,788 |
| typing-extensions |     14,321,140 |
| packaging         |     14,078,408 |
| Total             |    728,596,220 |

Thanks for the report!

I can reproduce. The command run is essentially:

pypinfo --json --indent 0 --limit 4000 --days  30 "" project

Previously the server had pypinfo version 16.0.0. The latest is 19.0.0 and there have been some changes including the data queried.

https://github.com/ofek/pypinfo/#1900

Attached is a zip showing the data (from today) in table and json format for both 16.0.0 and 19.0.0, showing a difference in numbers:

data.zip

PR #18 updates the data generation script to upgrade (pip and) pypinfo prior to fetching the data each month. I've already upgraded them both directly on the server.