ossf / scorecard

OpenSSF Scorecard - Security health metrics for Open Source

Home Page:https://scorecard.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature: New Check "Criticality and Maturity"

menocu opened this issue · comments

New check/s: Criticality & Maturity?

For some quick context, my use-case for ossf/scorecard is to quickly get some information about a repo I've never seen before, to help make a decision about relative risk, and it's suitability for inclusion into projects. One class of repo that I've noticed tends to score fairly poorly are repositories like https://github.com/gkz/type-check. These are repos that I think are characterized by being fairly mature, and highly depended upon; the example repo has very few recent commits, 2 contributors, and lists ~17.8 million dependents. This means it also scores fairly poorly on the contributors and commits checks.

If I were to manually review this repository, I think I would deem it acceptable for use, on the basis that ~17.8 million others have reached the same conclusion. Additionally, I have some level of assurance that if something malicious happens to this repo in the future, there is a high probability it will be noticed. I understand that the low number of commits and contributors doesn't necessarily mean this project is 'unhealthy' per se, I think it's more accurate to say it's 'done'.

I'm not sure about the specifics, and I'm not necessarily convinced it makes sense to bundle these concepts of criticality and maturity into the same check, but I think a check/s that somehow captured these judgements might be useful.

I'm not sure if Scorecard is the best tool for this. It might be a good data point if we want to migrate Scorecard to a "one-stop-shop" for various types of package metadata, but it's hard to come up with a good heuristic to distinguish an abandoned project from one that's simply mature/feature-complete.

Something that may be useful for you is another OpenSSF tool: https://github.com/ossf/criticality_score. It calculates a "criticality score" for a given repository using a bunch of heuristics.

It had a cronjob that automatically scanned ~1M repos and made their scores available via BigQuery. They disabled the cronjob a year ago for some reason, but the data is still available. In the particular case of gkz/type-check you mentioned, it had a criticality_score of 0.52, which ranked it among the 28k highest-scoring projects (i.e. top 3% of projects) at the time.

To help get a better idea, how would you measure project maturity, and how would you distinguish maturity from inactivity/abandonment?

This means it also scores fairly poorly on the contributors and commits checks.

This shortcoming of the Maintained check and mature projects has been brought up before. Any heuristic is going to have edge cases when you consider millions of repos. Really the check looks at "Activity", not whether or not the project is maintained.

Something that may be useful for you is another OpenSSF tool: https://github.com/ossf/criticality_score. It calculates a "criticality score" for a given repository using a bunch of heuristics.

+1 to what Pedro said. Criticality Score is likely what you want?

They disabled the cronjob a year ago for some reason

Caleb is working on getting it going again. There were some technical reasons, partially around token search quota and GCP pub/sub retention time that limited the project's ability to finish.

I'm not necessarily convinced they need to be the same check. Maybe adding a check based on criticality_score would be enough, and users could decide for themselves if a highly depended upon project that was not being actively maintained was risky or not.

Again, I also understand that how useful this check would be depends on your perspective; If you're using scorecard to assess a repo's security, then using that data to take action to improve it's security posture, there isn't much you can do to move the needle on criticality. On the other hand, I think it could be quite useful if you're using scorecard to quickly get a picture of security risk associated with a random third-party repo. I think all things being equal, a repo with 30 million dependents carries less risk than a repo with 300 dependents.

This issue has been marked stale because it has been open for 60 days with no activity.