trickest / packages

Automated compromise detection of the world's most popular packages

Home Page:https://trickest.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Packages Tweet

Automated compromise detection of the world's most popular packages

For each package registry, 5 files are generated:

  • non_existent_users.csv: Packages that point to a GitHub repository whose owner doesn't exist anymore: PyPI, npm
  • suspicious_updates.csv: Packages that have been updated on the package repository without a corresponding update to the code repository's default branch: PyPI, npm
  • broken_urls.csv: Packages that have a broken URL anywhere in their description, homepage, docs URL, bugtrack URL, etc: PyPI, npm
  • mismatching_package_repository.csv: Packages that point to a GitHub repository whose name doesn't match the package name (This isn't always indicative of a compromised package but it helps catch malicious packages that try to impersonate legitimate ones): PyPI, npm
  • repeating_repositories.csv: Packages that point to a GitHub repository that another package also points to (This isn't always indicative of a compromised package but it helps catch malicious packages that try to impersonate legitimate ones): PyPI, npm

How it Works

A Trickest workflow gets the initial dataset from:

Then, it performs multiple checks to find any red flags that could indicate that a package is (or can be) compromised.

Trickest Workflow

TB; DZ (Too big; didn't zoom)

  • The initial PyPI dataset is collected from the Top PyPI packages project, which contains a list of PyPI's top 5000 most downloaded packages, updated monthly. (Thanks, @hugovk!)

  • The npm dataset is collected using the npmrank project (Thanks, @anvaka!) which collect the:

    1. Top 1,000 most depended-upon packages
    2. Top 1,000 packages with the largest number of dependencies
    3. Top 1,000 packages with the highest PageRank score
    • When merged and deduplicated, they amount to ~2500 packages across all categories.
  • The package names are passed to the extract-metadata node which collects 4 categories of info about each package:

    • The latest package release date
    • The GitHub repository connected to the package
    • The repository's latest commit date
    • The URLs that the package points to anywhere
  • This node branches off into 5 checks:

    • The package's latest release date and repository's latest commit date are compared. If a package version has been released after the latest commit date, the package is flagged.
    • GitHub usernames are extracted from the repository URLs and passed to ffuf which queries the GitHub API to check if any usernames don't exist anymore (Thanks @joohoi!)
    • The package's URLs are passed to hakcheckurl to check if any URLs are broken and could be taken over. (Thanks @hakluke)
    • The package's GitHub repository is checked and the package is flagged if:
      • the repository name doesn't match the package name
      • the repository has been used in another package before
  • In the end, the results of these checks are matched back to their packages and pushed to this repository.

Contribution

All contributions are welcome! Got an idea for another check? Know a way to make a check more accurate? Feel free to create a new ticket via GitHub issues, tweet at us @trick3st, or join the conversation on Discord.

Build your own workflows!

We believe in the value of tinkering. Sign up for a demo on trickest.com to customize this workflow to your use case, get access to many more workflows, or build your own from scratch!