Pattern Passes - Should it?

Question

Pattern Passes - Should it?

varnerac opened this issue 7 years ago · comments

Following pattern passes with the latest version installed via pip

Enter a pattern to validate: [(V:'' <> -8.2 ) or M:M iSsuPerSet ''] anD [xox:o NOt LikE '\''] wiThiN .551 seconds wITHIn -59 SEcoNdS

PASS: [(V:'' <> -8.2 ) or M:M iSsuPerSet ''] anD [xox:o NOt LikE '\''] wiThiN .551 seconds wITHIn -59 SEcoNdS

Should negative seconds be allowed for within? I am not sure what that would mean.
Should an empty string be allowed as an object path selector?
Should two within qualifiers be allowed, back-to-back like this?

chisholm · Answer 1 · Fri Dec 15 2017 10:50:40 GMT+0800 (China Standard Time)

I think many of these findings are due to the fact that the pattern validator currently really just ensures the input is parseable. The set of parseable strings is a superset of the set of spec-compliant patterns. More could certainly be done to sanity-check particular parts of patterns.

Should two within qualifiers be allowed, back-to-back like this?

From a grammatical point of view, I think it makes sense: a qualified observation expression is still an observation expression. All observation expressions may be qualified. Therefore, it should be legal to stack qualifiers like this, even though some combinations may seem silly. (And the spec allows it.)

Andrew Varner · Answer 2 · Fri Dec 15 2017 10:56:23 GMT+0800 (China Standard Time)

I guess the question is what should we expect from a validator?.

The spec doesn't seem to allow mixed case true, false, WITHIN, etc. It seems like an easy fix in the lexer. It actually makes getting good coverage of the lexer easier.

Greg Back · Answer 3 · Tue Feb 27 2018 01:26:06 GMT+0800 (China Standard Time)

We've generally taken the attitude that we don't want to spend a lot of effort keeping people from doing "silly, but technically valid" things.

In terms of the mixed case, I think it would help to normalize to what's in the spec.

If there are easy ways to fix this, I'd be happy to merge a PR.

chisholm · Answer 4 · Tue Feb 27 2018 01:38:47 GMT+0800 (China Standard Time)

So, are you agreeing with @varnerac then? Hard to tell: "normalization" to me implies a transformation, and the validator's job isn't to transform the patterns. Only tools I can think where normalization is applicable might be those for transforming between STIX versions, e.g. the elevator. It could be lenient in what it accepts, but produce output where all keywords are upper-cased (normalized), for example. Or maybe I'm completely misunderstanding what you meant :)

Greg Back · Answer 5 · Tue Feb 27 2018 22:17:14 GMT+0800 (China Standard Time)

The "negative seconds" is prohibited in the spec, so we should mark those patterns as valid. (similar to #41).
Empty strings as object path components is a tricky one. Python actually allows a dictionary to have a key that is the empty string, as does JSON. But the two places in STIX that allow user-defined keys are custom properties and Cyber Observable dictionaries, both of which require a minimum length of 3 (may change to 2). So I don't know of anywhere this would be valid, but I'm tempted to not have the validator reject it, since the patterning spec doesn't explicitly forbid it.
Empty strings as right-hand-side values in Comparison Expressions should be allowed, to represent (for example) "the X property is equal to an empty string".
Two WITHIN qualifiers should IMO be interpreted as needing to match the more restrictive (smaller) value. I'm not sure how the matcher handles this now, but I don't think the validator should reject this, despite the larger value having no semantic value.
I think we should require keywords (operators, qualifiers, etc.) to be all-upper-case to match what is in the spec, not allowing mixed case or lower case (but I am willing to be be convinced otherwise on this). As far as I can tell, the spec doesn't say "keywords MUST be all-upper-case", but that's how they are defined in tables and how all the examples use them. "normalize" wasn't the right word, sorry.

@ikiril01 @treyka am I missing anything?

Patrick Maroney · Answer 6 · Tue Feb 27 2018 23:29:00 GMT+0800 (China Standard Time)

Shouldn’t "the X property is equal to an empty string" declaration be explicit? On Feb 27, 2018, at 9:17 AM, Greg Back <notifications@github.com> wrote: "the X property is equal to an empty string".

Trey Darley · Answer 7 · Tue Feb 27 2018 23:44:31 GMT+0800 (China Standard Time)

@packet-rat You mean like oasis-tcs/cti-stix2#52?

Patrick Maroney · Answer 8 · Wed Feb 28 2018 02:09:56 GMT+0800 (China Standard Time)

Trey, sorry I'm not following the reference. I was commenting on Greg's earlier question:

" Empty strings as right-hand-side values in Comparison Expressions should be allowed, to represent (for example) "the X property is equal to an empty string"

I was answering/arguing/proposing that a search for an "Empty String" should be an explicit pattern declaration.

chisholm · Answer 9 · Wed Feb 28 2018 02:16:02 GMT+0800 (China Standard Time)

Two WITHIN qualifiers should IMO be interpreted as needing to match the more restrictive (smaller) value. I'm not sure how the matcher handles this now, but I don't think the validator should reject this, despite the larger value having no semantic value.

Yeah, I think that's how the matcher would behave. It's not explicitly programmed in, it's more of an emergent property.

I think we should require keywords (operators, qualifiers, etc.) to be all-upper-case to match what is in the spec, not allowing mixed case or lower case (but I am willing to be be convinced otherwise on this). As far as I can tell, the spec doesn't say "keywords MUST be all-upper-case", but that's how they are defined in tables and how all the examples use them. "normalize" wasn't the right word, sorry.

Ah, ok. Yeah, seems like it would be an easy lexer grammar fix. E.g. just use 'AND' instead of what is essentially [Aa][Nn][Dd]. Maybe we could get rid of all those single letter fragments and simplify the grammar.

Andrew Varner · Answer 10 · Wed Feb 28 2018 04:18:11 GMT+0800 (China Standard Time)

For multiple qualifiers of the same type, I think the answer is to disallow multiple instances of the same qualifiers on a single Observational Expression in the next spec revision. If we wanted to string them together, and I don't think we do, we'd provide boolean operators for them. I'll open a ticket on the spec.

Greg Back · Answer 11 · Wed Feb 28 2018 23:01:24 GMT+0800 (China Standard Time)

@packet-rat , I don't think @varnerac was explicitly asking about this, but his example included M:M iSsuPerSet ''. Depending on the semantics of the "null" operator @treyka linked to, these could be equivalent or not. But having an empty string on the RHS of an expression is allowed now, and I believe this is correct.

Theres a related issue of whether ISSUPERSET '' should be valid:

b MUST be a valid string representation of the corresponding Object type.

But the pattern validator currently has no knowledge of the Cyber Observable object model, and I think I'd prefer to keep it that way.

@chisholm I agree. Getting rid of some of those fragments is probably a good idea, but I'd want confirmation from @treyka or @ikiril01 that that matches the intent of the spec.

@varnerac If it makes it into the spec, we can implement that here, but we (intentionally) cannot make up new spec requirements in this repo 😁 .

Greg Back · Answer 12 · Wed Jun 13 2018 05:08:25 GMT+0800 (China Standard Time)

There seem to be several different questions being discussed here. I need to go back and review this issue and determine the separate issues and what we should do with them.

Greg Back · Answer 13 · Tue Jun 19 2018 02:48:32 GMT+0800 (China Standard Time)

It looks like the WITHIN qualifier has been restricted to positive values only, so that's been taken care of.

Empty object-path components and empty RHS values are not explicitly disallowed by the spec, so for now I don't think there's anything we should change in the validator. The validator is deliberately ignorant of the cyber observable object structure and rules, so those types of restrictions don't need to need to be addressed here. The stix2-validator catches unknown object types and other related "mistakes" based on its knowledge of cyber observable objects.

I agree that we should remove the case-insensitivity on operators and qualifiers, but that should be done in the grammar and then updated here. I'll make a separate issue for that.