google / osv.dev

Open source vulnerability DB and triage service.

Home Page:https://osv.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Parity mismatch between API and zip

jimshowalter opened this issue · comments

Describe the bug
In our parity test, the assert fails because what's returned from the API doesn't match what we hardcoded to expect.

Which normally would be fine--we'd just update what to expect.

Except when we do that, then the assert fails because what's in the zip doesn't match what we hardcoded to expect.

To Reproduce
Get vulns for org.bouncycastle:bcprov-jdk15on:1.57 from both the API and the zip and compare.

Expected behaviour
They should match.

Screenshots
api
zip

// TODO: Simplify this once the API and zip match again:
assertTrue(
    logger
            .errors
            .get(2)
            .equals(
                "org.bouncycastle:bcprov-jdk15on:1.57:\n"
                    + "\tGHSA-4446-656p-f54g\n"
                    + "\t\tseverity: CRITICAL\n"
                    + "\t\tsummary: Deserialization of Untrusted Data in Bouncy castle\n"
                    + "\t\tCVEs:\n"
                    + "\t\t\thttps://nvd.nist.gov/vuln/detail/CVE-2018-1000613\n"
                    + "\tGHSA-6xx3-rg99-gc3p\n"
                    + "\t\tseverity: MODERATE\n"
                    + "\t\tsummary: Timing based private key exposure in Bouncy Castle\n"
                    + "\t\tCVEs:\n"
                    + "\t\t\thttps://nvd.nist.gov/vuln/detail/CVE-2020-15522\n"
                    + "\tGHSA-hr8g-6v94-x4m9\n"
                    + "\t\tseverity: MODERATE\n"
                    + "\t\tsummary: Bouncy Castle For Java LDAP injection vulnerability\n"
                    + "\t\tCVEs:\n"
                    + "\t\t\thttps://nvd.nist.gov/vuln/detail/CVE-2023-33201\n")
        || logger
            .errors
            .get(2)
            .equals(
                "org.bouncycastle:bcprov-jdk15on:1.57:\n"
                    + "\tGHSA-4446-656p-f54g\n"
                    + "\t\tseverity: CRITICAL\n"
                    + "\t\tsummary: Deserialization of Untrusted Data in Bouncy castle\n"
                    + "\t\tCVEs:\n"
                    + "\t\t\thttps://nvd.nist.gov/vuln/detail/CVE-2018-1000613\n"
                    + "\tGHSA-6xx3-rg99-gc3p\n"
                    + "\t\tseverity: MODERATE\n"
                    + "\t\tsummary: Timing based private key exposure in Bouncy Castle\n"
                    + "\t\tCVEs:\n"
                    + "\t\t\thttps://nvd.nist.gov/vuln/detail/CVE-2020-15522\n"
                    + "\tGHSA-72m5-fvvv-55m6\n"
                    + "\t\tseverity: MODERATE\n"
                    + "\t\tsummary: Observable Differences in Behavior to Error Inputs in Bouncy Castle\n"
                    + "\t\tCVEs:\n"
                    + "\t\t\thttps://nvd.nist.gov/vuln/detail/CVE-2020-26939\n"
                    + "\tGHSA-hr8g-6v94-x4m9\n"
                    + "\t\tseverity: MODERATE\n"
                    + "\t\tsummary: Bouncy Castle For Java LDAP injection vulnerability\n"
                    + "\t\tCVEs:\n"
                    + "\t\t\thttps://nvd.nist.gov/vuln/detail/CVE-2023-33201\n"
                    + "\tGHSA-wjxj-5m7g-mg7q\n"
                    + "\t\tseverity: MODERATE\n"
                    + "\t\tsummary: Bouncy Castle Denial of Service (DoS)\n"
                    + "\t\tCVEs:\n"
                    + "\t\t\thttps://nvd.nist.gov/vuln/detail/CVE-2023-33202\n"));

And today, another:

2 = "ID 'GHSA-hgjh-9rj2-g67j' from org.springframework:spring-web:6.1.2, hasVulns=false not found in vulns data"
3 = "ID 'GHSA-r978-9m6m-6gm6' from org.apache.zookeeper:zookeeper:3.8.3, hasVulns=false not found in vulns data"

Just to confirm, the 'Actual' in your first screenshot is from the OSV API, and the 'Actual' in the second is from the zip / bucket download?

Looking at a few of the records in the zip right now, it looks like they should match the API responses.

I suspect you might be detecting the changes on the API side before the data is finished exporting to the GCP bucket. The export job runs once an hour, so you're probably pulling the old version of a record if you download it as soon as the API response changes.

An hour lag explains it. Just wanted to make sure you were aware of it. Typically the parity-check job passes, so we don't stress about it when it fails.

Hi @jimshowalter as I said in #1994 (comment) something very strange would have to happen for the API and GCS exported output to long-term diverge. You might want to allow for repeat failures over a longer timeframe (say 3 hours) before considering parity to be fundamentally broken.

Any single individual high-churn vulnerability could repeatedly fail there was a lot of volatility, but that would seem to be very very unlikely.