Improve schema compilation performance for schemas with large strings (review regular expressions)

Question

Improve schema compilation performance for schemas with large strings (review regular expressions)

cristianstaicu opened this issue 7 years ago · comments

Cristian-Alexandru STAICU commented 7 years ago

The following regular expression used in compiling the JSON-schema is vulnerable to ReDoS:

 /if\s*\([^)]+\)\s*\{\s*\}(?!\s*else)/g

The slowdown is moderate: for 40.000 characters around 4 seconds matching time. However, I would still suggest one of the following:

remove the regex,
anchor the regex,
limit the number of characters that can be matched by the repetition,
limit the input size.

If needed, I can provide an actual example showing the slowdown.

Evgeny Poberezkin · Answer 1 · Tue Sep 05 2017 23:58:37 GMT+0800 (China Standard Time)

This particular regular expression is only used during the schema compilation. The schema is not supplied by users, so this regex will only be used once (assuming you compile your schemas once). Also, the generated code never has more than one space so the actual match will never be that slow.

Could you demonstrate the actual schema that is very slow to compile because of this regular expression (i.e. when removing it would make schema compilation noticeably faster)?

Cristian-Alexandru STAICU · Answer 2 · Wed Sep 06 2017 00:11:43 GMT+0800 (China Standard Time)

function genstr(len, chr) {
    var result = "";
    for (i=0; i<=len; i++) {
        result = result + chr;
    }
    return result;
}

var Ajv = require('ajv');
var ajv = new Ajv;

var validate = ajv.compile({
    "type": "object",
    "properties": {
        "foo": { "type": 'string',
            "oneOf": [
                {"pattern": genstr(12000, "if(") +"x" +  genstr(12000, ")")  }
            ]}
    }
});

This code blocks the main Node.js event loop on my PC for 5 seconds. But if indeed the schema is never under the user's control, I agree this is not a problem. My only concern is that I am not sure if the 850 modules that depend on this module are aware of this assumption.

John Carlson · Answer 3 · Wed Sep 06 2017 00:48:09 GMT+0800 (China Standard Time)

Thank god my schemas are not provided by users. And I was just thinking about generating a schema based on data in the JSON document (profile, component). Naughty users.

More about my use case. I have certain objects that only appear under certain profiles (enumeration) and components (enumeration and integer level). The combinatorics are such that generating all possible schemas is unlikely. It is acceptable to accept a Full profile and check the entire schema, but that won't tell you if a certain profile or component is required for a given object. Thus some objects in a full profile probably shouldn't be passed through a lesser profile. Does something like this sound possible in draft-06? Or even draft-04? I know I can do oneOf's for profiles, but components that can be removed from documents at will that may cause some objects to become invalid seem much more difficult to validate. This is the equivalent of providing different schema views of a schema, I think.

Evgeny Poberezkin · Answer 4 · Wed Sep 06 2017 01:02:46 GMT+0800 (China Standard Time)

@cristianstaicu thank you

The compilation time in this particular case does not change if I remove the regex you mention (and couple of others). So it needs to be better investigated. It may be the case that there are some other regular expression used during schema compilation.

Evgeny Poberezkin · Answer 5 · Tue Sep 15 2020 15:43:21 GMT+0800 (China Standard Time)

Regular expressions no longer used during schema compilation (from 6.12.3)