pytest-dev / py

Python development support library (note: maintenance only)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ReDoS vulnerability in svnurl.py

SCH227 opened this issue · comments

Good night!

I found that this regex is vulnerable to Regular Expression Denial of Service.

PoC:

>>> from py._path.svnurl import InfoSvnCommand
>>> payl = "   2256      hpk        165 Nov 24 17:55 __init__.py" + " " * 5000
>>> InfoSvnCommand(payl)

Attack vector:

An user accessing a (possibly remote) subversion repository that provides malicious "info" data.
Or an attacker injecting 'svn ls http://...' output (less realistic).

Fix:

Use a pattern with non-overlapping groups. I can help in finding a better regex and testing if needed.

Related: #256

I doubt there's anyone using this code, so I don't think it would warrant any sort of security notification that users should bother with, but if you prepare a PR for this I'll merge & release it.

As @skialpine mentions, the advisory now triggers pip-audit and thus can fail CI runs (like this one).

Is there a fix on the horizon?

The issue is not considered critical, im not aware of anyone working towards a fix

Well, congratulations to whoever it is that decided that the right path of action here is getting a CVE for...

  • Something that seems very questionable to be titled a "vulnerability" in the first place (if I can control an SVN repo/server, I might as well just Slowloris the initial info request, I suppose... though admittedly I didn't try that.)
  • For some ~18 year old code...
  • ...which is only there for historical reasons and discouraged to use
  • ...which, to the best of our knowledge, is not used anywhere in the wild outside of some old PyPy development scripts nobody probably uses anymore (and certainly not against random SVN servers). Note the search results seem to be copies of those PyPy scripts, as far as I can tell.
  • ...which, given the above, nobody is terribly interested in maintaining
  • ...yet you ended up generating nothing but noise for hundreds of thousands of pytest users, which for historical reasons depends on pylib (since it came from the PyPy project, like pylib does). Yes, not everyone of the half quarter a million projects there will monitor CVEs against dependencies, but at the same time, lots of people that do (in companies and such) are probably not in that list on GitHub.

Given that it will still take some time for pytest to get fully rid of it's py internals, and people seem to like getting CVEs for this project (which just happens to be very popular via pytest, but pretty much unused outside of pytest), but without really making an attempt to understanding the context... can we please consider:

  • Either vendoring the remaining code (py.path I assume) in pytest, and archiving pylib...
  • ...or releasing a pylib 2.0.0 which simply drops all the code that isn't used in pytest?

Sorry if I sound frustrated. But requesting a CVE without understanding the context of the project/issue you're reporting, and then generating false reports for hundreds of thousands of pytest users is doing a major disservice to both pytest/pylib maintainers and all those users.

I'd go as far as to label this cve report a supply chain attack

Pytest should keep dropping pylib as it does

The cve should be added as common false positive

I'd go as far as to label this cve report a supply chain attack

I would not, but I'd certainly add it to my "examples of behavior causing open source maintainer burnout" list, given that we're now getting the first pytest issue about this, and it almost certainly won't be the last one.

@bluetech @The-Compiler i woudnt mind to cut a release that drops the svn wc stuff alltogether

@RonnyPfannschmidt agreed, and perhaps a bit more even - I opened #288 with some overview on that.

FWIW, I've also proposed adding a note to the GitHub advisory (github/advisory-database#761) and tweeted a PSA.

GitHub have now reacted and amended the advisory:

The particular codepath in question is the regular expression at py._path.svnurl.InfoSvnCommand.lspattern and is only relevant when dealing with subversion (svn) projects. Notablely the codepath is not used in the popular pytest project.

and apparently also added that information for Dependabot alerts:

I've also added the codepath to the advisory so that dependabot can more intelligently target alerts.

pip-audit maintainer here. I have no say in it, but it's a shame (in my personal opinion) that this kind of low-quality finding was assigned a CVE ID without any significant cross-checking.

I've filed a CVE rejection request with MITRE, since they're the CVE CNA in this case. If they successfully reject it, getting it removed from GHSA should also be easy.

Actually, looks like I'll be able to propose a GHSA withdrawal even if the CVE itself hasn't been retracted. I'll open a PR for that in a moment.

I've filed github/advisory-database#762 to mark the GHSA report as withdrawn.

@The-Compiler agreeable and funny #287 (comment) 😁

@woodruffw thanks for following up on the advisory! That means that pip-audit won’t pick up that CVE any longer once pypa/pip-audit#385 has merged, correct?

@woodruffw thanks for following up on the advisory! That means that pip-audit won’t pick up that CVE any longer once pypa/pip-audit#385 has merged, correct?

Once that's merged and the OSV entry is marked as withdrawn, yeah.

@The-Compiler @bluetech @woodruffw & other maintainers: don't sweat it! Even though 18 years ago, someone wrote a regex, not thinking of 2022-levels of security paranoia, your efforts are still very much appreciated and your contributions are incredibly productive.

The CVE did trigger some alarms in builds of valuable systems used by financial institutions. But that's good. It doesn't necessarily mean that pytest is suddenly unusable. It just forces downstream developers in highly secured places to stop and think. Alerts can be ignored, assessed, categorized, accepted,...

Please look at CVEs as mostly just a catalyst. Having the CVE removed is a harsh reaction, but the right one if we're absolutely sure that that particular regex is not exploitable. But try to think of it from the point of view of a security officer, with full paranoia goggles on (and no intimate knowledge of the module). The regex is present in the code. It could be executed. A hacker could penetrate, set up a SVN repo, craft some commands, take advantage of other existing vulnerabilities on the system,... You have to think worst case and then some.

In the end, I think a correctly worded CVE with a low severity score which clearly includes all the IFs, would have been the best option here.

Thanks!

Taking the freedom to quote what I wrote over at github/advisory-database#762 (comment):

but to my eye the advisory is in fact talking about a real redos vulnerability. I do understand the annoyance with what you call CVE spam, but if the advisory is valid then we want to include it in our data set.

In theory, when viewed in a vacuum: indeed.

However, even a very broad search on GitHub for the affected code yields 92 results. I've looked through them all, and from what I can tell, all the matches fall into one of those categories:

  • Copies of py using it (in svnurl(), which the search captures as well)
  • Copies of pypy (where the py library originates from historically), in it's development tools directory. However, PyPy has moved away from SVN in 2010, so they're probably just bitrotting. In any case, they use the repository the current file is in, or are hardcoded to run on a specific old PyPy team server (codespeak.net). If someone has control over that, they might as well just change those scripts. Even if there are certain scripts which actually run on arbitrary SVN servers, the context they are run in is unlikely to make a DoS a real problem. But given that they won't actually work anymore for their intended original purpose, the chance of anyone running them is pretty much zero.

On the other side of the coin, you have at least half a million projects depending on pylib via pytest, which got noise in their inbox.

I believe the main point of a CVE and security advisories is to make people aware of problems which have a real (even if small) chance of affecting them. An advisory which just adds noise to what I believe will be 100.0% of the receivers is just going to hurt the whole system and community.

In this particular case, and when viewed in context, the sheer amount of noise the advisory generates vs. its real usage is so high that the word "spam" is unfortunately very fitting. I can't help but think that "oh, I get a shiny CVE in a popular project" was the only motivation behind it.

Just to make sure there's no confusion: I'm not a maintainer of this project or of pytest, so my opinions are not those of the maintainers. I'm the maintainer of a separate dependency scanning project, so I have an interest in dependency feeds having a high signal-to-noise ratio. After this comment, I'll step out of this thread now (since I'm not a maintainer and I only stepped in to coordinate on the pip-audit side).

With that being said: I disagree that it's good that a CVE was filed for this behavior. The fact that one was filed and published seemingly without any review represents a series of communication and authority breakdowns; the fact that the project's own maintainers have done more investigation into exploitability potential than the original reporter seems to have is an indicator of this.

As for why that matters: not everybody is a bank, with roles dedicated to reviewing a constant deluge of security reports. In the context of more limited resources, managing security fatigue is far more important than reporting weakness classes like ReDoS, which don't manifest as exploitable vulnerabilities in the overwhelming majority of cases. When users get tired of useless reports, they disable their security tools entirely.

First of all, I did fill for a CVE, but I didn't publish it. Someone else did it. I find illogical it was you, but the GH advisory says "Credits - @The-Compiler".
The other thing I can think of, is that GH made it automatically because this is described in a public issue. But I find it strange, because I already reported other security issues to MITRE and they never got published before I informed they were made public

First of all, I did fill for a CVE, but I didn't publish it. Someone else did it.

Well, clearly something went wrong in that process then. But also clearly you requested a CVE with the intention of publishing it (despite maintainers trying to give you context around the issue), which still is... questionable at the very least.

I find illogical it was you, but the GH advisory says "Credits - @The-Compiler".

Probably due to my contribution to it clarifying the issue text.

The other thing I can think of, is that GH made it automatically because this is described in a public issue. But I find it strange, because I already reported other security issues to MITRE and they never got published before I informed they were made public

GitHub does not control MITRE CVEs. They only control their own GitHub Security Advisories. The CVE being published (which is what triggered the GHSA!) is something between you and MITRE.

Why I filled for a CVE?

  • The attack scenario and PoC described.
  • A very similar issue in this same library was issued a CVE in the past.
  • I first reported it on the security email, and was asked to open a public issue for it (this is discouraged). I wouldn't mind helping on issuing a fix, as I did in other high profile projects, but then there was no reply for weeks. Still, I would have waited several weeks more before publishing it.

I appreciate the deeper analysis provided a few days ago by the maintainers. As a security researcher, think I am analyzing dozens of projects with very diverse scenarios, contexts and implications.
Still, I don't think it should be only fixed the vulnerable code that is most used, but any which is possible to be used. As someone said before, alerts can be ignored, assessed, categorized, accepted. IMO, deciding to ignore a flaw is better that don't even knowing about it.
Also to say, a vulnerability doesn't need to be a Log4shell or a RCE to be one. That's why we have CVSS scores, which should be used together with other metrics to assess risk.

About the excessive noise generated by security alerts, indeed, that's sadly true.
Probably in the future we will be identifying vulnerable functions and if they are called in a project's code, as this awesome project wants to implement in Go to avoid false-positives.
In the meanwhile, this is what we have.

Two quick amendments on download/notification numbers:

  • It was pointed out to me that my "half a million" number was wrong. Dependents of pytest won't get a security notification about this, unless they also pin py (which, given the intended usage of requirements.txt files, is rather common). Something like 250-270k is probably a more realistic number, assuming that almost all of the py users have it pulled in via pytest or its plugins (which I still believe is a pretty safe assumption).
  • As a result of this issue, today pytest 7.2.0 was released, which vendors the parts of py it needs. With that being slowly adopted, I'm assuming the number of projects depending on py will be much lower soon.

Hey! related to the issue closed here, there are other packages that use py such as retry although according to this, it might be a mistake and py shouldn't be a dependency for retry

is this something they need to sort or are you guys able to help with?

Hey! related to the issue closed here, there are other packages that use py such as retry although according to this, it might be a mistake and py shouldn't be a dependency for retry

is this something they need to sort or are you guys able to help with?

tox also has a dependency with py. I opened an issue in their repo to see if they can get rid of the dependency like you did in pytest.

  • As a result of this issue, today pytest 7.2.0 was released, which vendors the parts of py it needs. With that being slowly adopted, I'm assuming the number of projects depending on py will be much lower soon.

📉 Midweek py PyPI numbers on are down from ~2M/day to ~1.4M/day:

image

Hey @hotenov,

Just curious where does 51457 come from?

Adding 2022-42969 (CVE ID) to my pipenv check --ignore didn't help, but 51457 did 😃

Asking for future references 🙈

commented

Hey @hotenov,

Just curious where does 51457 come from?

Adding 2022-42969 (CVE ID) to my pipenv check --ignore didn't help, but 51457 did 😃

Asking for future references 🙈

Hello, @mfilenko I stole it from previous reference :) by KeNaCo
But you can take ID from safety report:

image

Is the POC in the description actually working for anyone?

>>> from py._path.svnurl import InfoSvnCommand
>>> payl = "   2256      hpk        165 Nov 24 17:55 __init__.py" + " " * 5000
>>> InfoSvnCommand(payl)

For me it executes everything in a fraction of a second.

It's thinkable that recent python versions have complexity enhancements