hugovk / top-pypi-packages

A regular dump of the most-downloaded packages from PyPI

Home Page:https://hugovk.github.io/top-pypi-packages

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Query for the data

ChaiBapchya opened this issue · comments

What's the query being used for retrieving the data about the PyPi packages?

Data is queried using the https://github.com/ofek/pypinfo/ tool, with these commands:

# Generate the files
/usr/local/bin/pypinfo --json --indent 0 --limit 4000 --days 30 "" project > top-pypi-packages-30-days.json
/usr/local/bin/pypinfo --json --indent 0 --limit 4000 --days 365 "" project > top-pypi-packages-365-days.json

Running the same commands with the --test option to prints the queries:

$ pypinfo --test --json --indent 0 --limit 4000 --days  30 "" project
SELECT
  file.project as project,
  COUNT(*) as download_count,
FROM
  TABLE_DATE_RANGE(
    [the-psf:pypi.downloads],
    DATE_ADD(CURRENT_TIMESTAMP(), -31, "day"),
    DATE_ADD(CURRENT_TIMESTAMP(), -1, "day")
  )
WHERE
  details.installer.name = "pip"
GROUP BY
  project,
ORDER BY
  download_count DESC
LIMIT 4000
$ pypinfo --test --json --indent 0 --limit 4000 --days  365 "" project
SELECT
  file.project as project,
  COUNT(*) as download_count,
FROM
  TABLE_DATE_RANGE(
    [the-psf:pypi.downloads],
    DATE_ADD(CURRENT_TIMESTAMP(), -366, "day"),
    DATE_ADD(CURRENT_TIMESTAMP(), -1, "day")
  )
WHERE
  details.installer.name = "pip"
GROUP BY
  project,
ORDER BY
  download_count DESC
LIMIT 4000