ajv-validator / ajv

The fastest JSON schema Validator. Supports JSON Schema draft-04/06/07/2019-09/2020-12 and JSON Type Definition (RFC8927)

Home Page:https://ajv.js.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Improve schema compilation performance for schemas with large strings (review regular expressions)

cristianstaicu opened this issue · comments

The following regular expression used in compiling the JSON-schema is vulnerable to ReDoS:

 /if\s*\([^)]+\)\s*\{\s*\}(?!\s*else)/g

The slowdown is moderate: for 40.000 characters around 4 seconds matching time. However, I would still suggest one of the following:

  • remove the regex,
  • anchor the regex,
  • limit the number of characters that can be matched by the repetition,
  • limit the input size.

If needed, I can provide an actual example showing the slowdown.

This particular regular expression is only used during the schema compilation. The schema is not supplied by users, so this regex will only be used once (assuming you compile your schemas once). Also, the generated code never has more than one space so the actual match will never be that slow.

Could you demonstrate the actual schema that is very slow to compile because of this regular expression (i.e. when removing it would make schema compilation noticeably faster)?

function genstr(len, chr) {
    var result = "";
    for (i=0; i<=len; i++) {
        result = result + chr;
    }
    return result;
}

var Ajv = require('ajv');
var ajv = new Ajv;

var validate = ajv.compile({
    "type": "object",
    "properties": {
        "foo": { "type": 'string',
            "oneOf": [
                {"pattern": genstr(12000, "if(") +"x" +  genstr(12000, ")")  }
            ]}
    }
});

This code blocks the main Node.js event loop on my PC for 5 seconds. But if indeed the schema is never under the user's control, I agree this is not a problem. My only concern is that I am not sure if the 850 modules that depend on this module are aware of this assumption.

Thank god my schemas are not provided by users. And I was just thinking about generating a schema based on data in the JSON document (profile, component). Naughty users.

More about my use case. I have certain objects that only appear under certain profiles (enumeration) and components (enumeration and integer level). The combinatorics are such that generating all possible schemas is unlikely. It is acceptable to accept a Full profile and check the entire schema, but that won't tell you if a certain profile or component is required for a given object. Thus some objects in a full profile probably shouldn't be passed through a lesser profile. Does something like this sound possible in draft-06? Or even draft-04? I know I can do oneOf's for profiles, but components that can be removed from documents at will that may cause some objects to become invalid seem much more difficult to validate. This is the equivalent of providing different schema views of a schema, I think.

@cristianstaicu thank you

The compilation time in this particular case does not change if I remove the regex you mention (and couple of others). So it needs to be better investigated. It may be the case that there are some other regular expression used during schema compilation.

Regular expressions no longer used during schema compilation (from 6.12.3)