oasis-open / cti-pattern-validator

OASIS TC Open Repository: Validate patterns used to express cyber observable content in STIX Indicators

Home Page:https://stix2-patterns.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Validation fails on network-traffic:start

delliott90 opened this issue · comments

If I run the following STIX pattern
"[network-traffic:start = '2018-04-20T12:36:24.558Z']"
it fails validation and gives the following error
"FAIL: Error found at line 1:17. no viable alternative at input 'network-traffic:start'"

The pattern passes validation if I use the end attribute instead of start.

"start" is actually a pattern keyword (e.g. see the start/stop qualifier). Try putting it in single quotes:

[network-traffic:'start' = '2018-04-20T12:36:24.558Z']

That works, thank you.

@ikiril01 do we cover this in the patterning spec? quoting properties that match keywords in the pattern grammar? If not, should we?

You shouldn't have to put those quotes in there. The lexer should be able to handle this.. 'start' may be a reserved word, but there is no whitespace involved here. @gtback I would suggest reopen...

There doesn't need to be whitespace to have ambiguity. Seems like we need some context sensitivity. E.g. "start" means this in this context, and that in that context. I suspect this can be accomplished in antlr4 but would require some lexer trickery.

@chisholm Agree. We could have "start" show up all over the place in a pattern.

I'll re-open this, but I'm not a fan of anything requiring "lexer trickery" 😀 . More generally, @JasonKeirstead , I thought the reason we required quotes around "SHA-1" was to avoid potential ambiguity around the (potential future) subtraction operator. I realize that there's a difference between the proposed arithmetic operators (which are inside an observation expression) and the START/STOP operators (which are outside), but I can also see why that context sensitivity can lead to more complexity.

@gtback Perhaps something we should bring up to the TC - my question is, we're basically making a decision here on all SCO properties. Should you have to quote any property that is also either an observation or operator? There can be many future collisions.

This is unambiguous that start does not match the keyword start here... the open bracket [ means that START is not a keyword until it encounters a closing bracket.

If the grammar can get confused by this point, then the grammar needs to be fixed, not working around it like the above.

the open bracket [ means that START is not a keyword until it encounters a closing bracket.

That is context sensitivity, with respect to lexical analysis. Sometimes "start" is one thing, and sometimes it's another. Of course we as humans can see the difference and don't get confused. The lexer is pretty dumb.

If we made every string of letters into some kind of generic "ID" token (i.e. an "identifier"), then you'd have generic parser rules, like:

startStopQualifier : ID StringLiteral ID StringLiteral ;

And then something like hello '2018-06-14T19:16:29Z' world '2018-06-14T19:17:12Z' would be parser-grammar-valid. You could check that at a higher level, but it's nice if it can be caught earlier in the processing pipeline. So "start" was lex'd into a special start token and referenced in the parser grammar. It was like that when I got the grammar(s), and it made sense to me at the time. I'm no expert on parsing, but it makes sense to me to keep things as simple as possible until there's a demonstrated need for something more complicated.

The fix for this case is simple. The 2.0 Patterning Language spec defines the start literal as START. In the current ANTLR grammar it's [Ss][Tt][Aa][Rr][Tt]. It should only be START, based on the 2.0 spec where text highlighted in light blue is a literal. There's no need to allow StArt or other permutations.

I'll agree w/ requiring START keyword to be all caps. There is nothing in the spec that allows it to be lowercase.

This doesn't fix the problem as there could be a custom property on a SCO be START, and we still need to allow that to be unquoted.

You cannot have a custom property on an SCO named START

Custom Property names MUST be in ASCII and MUST only contain the characters a–z
(lowercase ASCII), 0–9, and underscore (_).

@varnerac is correct. The best way to fix this is to update the grammar (oasis-open/cti-stix2-json-schemas#68) and then incorporate those schemas into this repository.

@chisholm can you add tests and incorporate the new lexer code after fixing oasis-open/cti-stix2-json-schemas#68?

@delliott90 this should be fixed now. We'll be releasing a new version of the library pretty soon that incorporates this change.

Good to know, thank you.