[BUG] Failure to process "reserved" chars in regular expressions
michael-markevich opened this issue · comments
Describe the bug
Similar to #3514, the regex parser fails on the example from documentation: https://github.com/opensearch-project/data-prepper/blob/main/docs/expression_syntax.md#reference-table.
To Reproduce
Steps to reproduce the behavior:
- Create a pipeline with the following configuration
log-pipeline:
source:
http:
ssl: false
processor:
- parse_json:
source: message
parse_when: '/message=~"^\w*$"' # Fails
# parse_when: '/message=~"^\w*\ $"' # Also fails
# parse_when: '/message =~ "^(\\{.*\\}|\\[.*\\])$"' # Also fails
sink:
- opensearch:
hosts: [ 'https://opensearch:9200' ]
insecure: true
- Send in a log message (any).
- See the error log:
2024-05-07T12:07:25,039 [log-pipeline-processor-worker-1-thread-1] ERROR org.opensearch.dataprepper.plugins.processor.parse.AbstractParseProcessor - An exception occurred while using the parse_json processor on Event [org.opensearch.dataprepper.model.log.JacksonLog@25c210a1]
org.opensearch.dataprepper.expression.ExpressionEvaluationException: Unable to evaluate statement "/message=~"^\w*$""
at org.opensearch.dataprepper.expression.GenericExpressionEvaluator.evaluate(GenericExpressionEvaluator.java:42) ~[data-prepper-expression-2.7.0.jar:?]
at org.opensearch.dataprepper.expression.ExpressionEvaluator.evaluateConditional(ExpressionEvaluator.java:28) ~[data-prepper-api-2.7.0.jar:?]
at org.opensearch.dataprepper.plugins.processor.parse.AbstractParseProcessor.doExecute(AbstractParseProcessor.java:70) ~[parse-json-processor-2.7.0.jar:?]
at org.opensearch.dataprepper.model.processor.AbstractProcessor.lambda$execute$0(AbstractProcessor.java:54) ~[data-prepper-api-2.7.0.jar:?]
at io.micrometer.core.instrument.composite.CompositeTimer.record(CompositeTimer.java:69) [micrometer-core-1.11.5.jar:1.11.5]
at org.opensearch.dataprepper.model.processor.AbstractProcessor.execute(AbstractProcessor.java:54) [data-prepper-api-2.7.0.jar:?]
at org.opensearch.dataprepper.pipeline.ProcessWorker.doRun(ProcessWorker.java:135) [data-prepper-core-2.7.0.jar:?]
at org.opensearch.dataprepper.pipeline.ProcessWorker.run(ProcessWorker.java:61) [data-prepper-core-2.7.0.jar:?]
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
Caused by: org.opensearch.dataprepper.expression.ParseTreeCompositeException
at org.opensearch.dataprepper.expression.ParseTreeParser.createParseTree(ParseTreeParser.java:78) ~[data-prepper-expression-2.7.0.jar:?]
at org.opensearch.dataprepper.expression.ParseTreeParser.parse(ParseTreeParser.java:101) ~[data-prepper-expression-2.7.0.jar:?]
at org.opensearch.dataprepper.expression.ParseTreeParser.parse(ParseTreeParser.java:27) ~[data-prepper-expression-2.7.0.jar:?]
at org.opensearch.dataprepper.expression.MultiThreadParser.parse(MultiThreadParser.java:35) ~[data-prepper-expression-2.7.0.jar:?]
at org.opensearch.dataprepper.expression.MultiThreadParser.parse(MultiThreadParser.java:20) ~[data-prepper-expression-2.7.0.jar:?]
at org.opensearch.dataprepper.expression.GenericExpressionEvaluator.evaluate(GenericExpressionEvaluator.java:38) ~[data-prepper-expression-2.7.0.jar:?]
... 12 more
Caused by: org.opensearch.dataprepper.expression.ExceptionOverview: Multiple exceptions (5)
|-- org.antlr.v4.runtime.LexerNoViableAltException: null
at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
|-- org.antlr.v4.runtime.LexerNoViableAltException: null
at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
|-- org.antlr.v4.runtime.InputMismatchException: null
at org.antlr.v4.runtime.DefaultErrorStrategy.sync(DefaultErrorStrategy.java:270)
|-- org.antlr.v4.runtime.LexerNoViableAltException: null
at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
|-- org.antlr.v4.runtime.LexerNoViableAltException: null
at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
line 1:11 token recognition error at: '^'
line 1:12 token recognition error at: '\'
line 1:13 token recognition error at: 'w*'
line 1:15 token recognition error at: '$"'
line 1:10 mismatched input '"' expecting {JsonPointer, EscapedJsonPointer, String}
2024-05-07T12:07:25,042 [log-pipeline-processor-worker-1-thread-1] ERROR org.opensearch.dataprepper.plugins.processor.parse.AbstractParseProcessor - An exception occurred while using the parse_json processor on Event [org.opensearch.dataprepper.model.log.JacksonLog@2c0f1eaf]
org.opensearch.dataprepper.expression.ExpressionEvaluationException: Unable to evaluate statement "/message=~"^\w*$""
- If you escape the dollar sign, you still get an error:
line 1:11 token recognition error at: '^'
line 1:12 token recognition error at: '\'
line 1:13 token recognition error at: 'w*'
line 1:15 token recognition error at: '\'
line 1:16 token recognition error at: '$"'
line 1:10 mismatched input '"' expecting {JsonPointer, EscapedJsonPointer, String}
2024-05-07T12:16:50,189 [log-pipeline-processor-worker-1-thread-1] ERROR org.opensearch.dataprepper.plugins.processor.parse.AbstractParseProcessor - An exception occurred while using the parse_json processor on Event [org.opensearch.dataprepper.model.log.JacksonLog@686b21ea]
org.opensearch.dataprepper.expression.ExpressionEvaluationException: Unable to evaluate statement "/message=~"^\w*\$""
at org.opensearch.dataprepper.expression.GenericExpressionEvaluator.evaluate(GenericExpressionEvaluator.java:42) ~[data-prepper-expression-2.7.0.jar:?]
at org.opensearch.dataprepper.expression.ExpressionEvaluator.evaluateConditional(ExpressionEvaluator.java:28) ~[data-prepper-api-2.7.0.jar:?]
at org.opensearch.dataprepper.plugins.processor.parse.AbstractParseProcessor.doExecute(AbstractParseProcessor.java:70) ~[parse-json-processor-2.7.0.jar:?]
at org.opensearch.dataprepper.model.processor.AbstractProcessor.lambda$execute$0(AbstractProcessor.java:54) ~[data-prepper-api-2.7.0.jar:?]
at io.micrometer.core.instrument.composite.CompositeTimer.record(CompositeTimer.java:69) [micrometer-core-1.11.5.jar:1.11.5]
at org.opensearch.dataprepper.model.processor.AbstractProcessor.execute(AbstractProcessor.java:54) [data-prepper-api-2.7.0.jar:?]
at org.opensearch.dataprepper.pipeline.ProcessWorker.doRun(ProcessWorker.java:135) [data-prepper-core-2.7.0.jar:?]
at org.opensearch.dataprepper.pipeline.ProcessWorker.run(ProcessWorker.java:61) [data-prepper-core-2.7.0.jar:?]
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
Caused by: org.opensearch.dataprepper.expression.ParseTreeCompositeException
at org.opensearch.dataprepper.expression.ParseTreeParser.createParseTree(ParseTreeParser.java:78) ~[data-prepper-expression-2.7.0.jar:?]
at org.opensearch.dataprepper.expression.ParseTreeParser.parse(ParseTreeParser.java:101) ~[data-prepper-expression-2.7.0.jar:?]
at org.opensearch.dataprepper.expression.ParseTreeParser.parse(ParseTreeParser.java:27) ~[data-prepper-expression-2.7.0.jar:?]
at org.opensearch.dataprepper.expression.MultiThreadParser.parse(MultiThreadParser.java:35) ~[data-prepper-expression-2.7.0.jar:?]
at org.opensearch.dataprepper.expression.MultiThreadParser.parse(MultiThreadParser.java:20) ~[data-prepper-expression-2.7.0.jar:?]
at org.opensearch.dataprepper.expression.GenericExpressionEvaluator.evaluate(GenericExpressionEvaluator.java:38) ~[data-prepper-expression-2.7.0.jar:?]
... 12 more
Caused by: org.opensearch.dataprepper.expression.ExceptionOverview: Multiple exceptions (6)
|-- org.antlr.v4.runtime.LexerNoViableAltException: null
at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
|-- org.antlr.v4.runtime.LexerNoViableAltException: null
at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
|-- org.antlr.v4.runtime.LexerNoViableAltException: null
at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
|-- org.antlr.v4.runtime.LexerNoViableAltException: null
at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
|-- org.antlr.v4.runtime.LexerNoViableAltException: null
at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
|-- org.antlr.v4.runtime.InputMismatchException: null
at org.antlr.v4.runtime.DefaultErrorStrategy.sync(DefaultErrorStrategy.java:270)
line 1:11 token recognition error at: '^'
line 1:12 token recognition error at: '\'
line 1:13 token recognition error at: 'w*'
line 1:15 token recognition error at: '\'
line 1:16 token recognition error at: '$"'
line 1:10 mismatched input '"' expecting {JsonPointer, EscapedJsonPointer, String}
2024-05-07T12:16:50,191 [log-pipeline-processor-worker-1-thread-1] ERROR org.opensearch.dataprepper.plugins.processor.parse.AbstractParseProcessor - An exception occurred while using the parse_json processor on Event [org.opensearch.dataprepper.model.log.JacksonLog@1c5deeb4]
org.opensearch.dataprepper.expression.ExpressionEvaluationException: Unable to evaluate statement "/message=~"^\w*\$""
- Parsing also fails when checking if message is a JSON string or array with the following regex:
parse_when: '/message =~ "^(\{.\}|\[.\])$"'
Expected behavior
Regex should be parsed correctly.
Environment (please complete the following information):
- Data Prepper 2.7.
Additional context
Add any other context about the problem here.
Looks like a bug in expression grammar.
Additional notes to the case:
- If I add a regular expression without { } $ (or some other special characters), it works perfectly fine. Our example was tested on a different parser and works there. As mentioned above, even the example from your documentation ("^\w*$") fails the test because of the dollar sign.
- This use case is quite important for us, because it helps to distinguish log messages with JSON structure from any other (syslog) messages, avoid parser errors and improve overall performance. Also, such behaviour is a standard feature in Graylog.