opensearch-project / data-prepper

Data Prepper is a component of the OpenSearch project that accepts, filters, transforms, enriches, and routes data at scale.

Home Page:https://opensearch.org/docs/latest/clients/data-prepper/index/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] Failure to process "reserved" chars in regular expressions

michael-markevich opened this issue · comments

Describe the bug
Similar to #3514, the regex parser fails on the example from documentation: https://github.com/opensearch-project/data-prepper/blob/main/docs/expression_syntax.md#reference-table.

To Reproduce
Steps to reproduce the behavior:

  1. Create a pipeline with the following configuration
log-pipeline:
  source:
    http:
      ssl: false

  processor:
    - parse_json:
        source: message
        parse_when: '/message=~"^\w*$"' # Fails
        # parse_when: '/message=~"^\w*\ $"' # Also fails
        # parse_when: '/message =~ "^(\\{.*\\}|\\[.*\\])$"' # Also fails

  sink:
    - opensearch:
        hosts: [ 'https://opensearch:9200' ]
        insecure: true

  1. Send in a log message (any).
  2. See the error log:
2024-05-07T12:07:25,039 [log-pipeline-processor-worker-1-thread-1] ERROR org.opensearch.dataprepper.plugins.processor.parse.AbstractParseProcessor - An exception occurred while using the parse_json processor on Event [org.opensearch.dataprepper.model.log.JacksonLog@25c210a1]
org.opensearch.dataprepper.expression.ExpressionEvaluationException: Unable to evaluate statement "/message=~"^\w*$""
	at org.opensearch.dataprepper.expression.GenericExpressionEvaluator.evaluate(GenericExpressionEvaluator.java:42) ~[data-prepper-expression-2.7.0.jar:?]
	at org.opensearch.dataprepper.expression.ExpressionEvaluator.evaluateConditional(ExpressionEvaluator.java:28) ~[data-prepper-api-2.7.0.jar:?]
	at org.opensearch.dataprepper.plugins.processor.parse.AbstractParseProcessor.doExecute(AbstractParseProcessor.java:70) ~[parse-json-processor-2.7.0.jar:?]
	at org.opensearch.dataprepper.model.processor.AbstractProcessor.lambda$execute$0(AbstractProcessor.java:54) ~[data-prepper-api-2.7.0.jar:?]
	at io.micrometer.core.instrument.composite.CompositeTimer.record(CompositeTimer.java:69) [micrometer-core-1.11.5.jar:1.11.5]
	at org.opensearch.dataprepper.model.processor.AbstractProcessor.execute(AbstractProcessor.java:54) [data-prepper-api-2.7.0.jar:?]
	at org.opensearch.dataprepper.pipeline.ProcessWorker.doRun(ProcessWorker.java:135) [data-prepper-core-2.7.0.jar:?]
	at org.opensearch.dataprepper.pipeline.ProcessWorker.run(ProcessWorker.java:61) [data-prepper-core-2.7.0.jar:?]
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
Caused by: org.opensearch.dataprepper.expression.ParseTreeCompositeException
	at org.opensearch.dataprepper.expression.ParseTreeParser.createParseTree(ParseTreeParser.java:78) ~[data-prepper-expression-2.7.0.jar:?]
	at org.opensearch.dataprepper.expression.ParseTreeParser.parse(ParseTreeParser.java:101) ~[data-prepper-expression-2.7.0.jar:?]
	at org.opensearch.dataprepper.expression.ParseTreeParser.parse(ParseTreeParser.java:27) ~[data-prepper-expression-2.7.0.jar:?]
	at org.opensearch.dataprepper.expression.MultiThreadParser.parse(MultiThreadParser.java:35) ~[data-prepper-expression-2.7.0.jar:?]
	at org.opensearch.dataprepper.expression.MultiThreadParser.parse(MultiThreadParser.java:20) ~[data-prepper-expression-2.7.0.jar:?]
	at org.opensearch.dataprepper.expression.GenericExpressionEvaluator.evaluate(GenericExpressionEvaluator.java:38) ~[data-prepper-expression-2.7.0.jar:?]
	... 12 more
Caused by: org.opensearch.dataprepper.expression.ExceptionOverview: Multiple exceptions (5)
|-- org.antlr.v4.runtime.LexerNoViableAltException: null
    at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
|-- org.antlr.v4.runtime.LexerNoViableAltException: null
    at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
|-- org.antlr.v4.runtime.InputMismatchException: null
    at org.antlr.v4.runtime.DefaultErrorStrategy.sync(DefaultErrorStrategy.java:270)
|-- org.antlr.v4.runtime.LexerNoViableAltException: null
    at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
|-- org.antlr.v4.runtime.LexerNoViableAltException: null
    at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
line 1:11 token recognition error at: '^'
line 1:12 token recognition error at: '\'
line 1:13 token recognition error at: 'w*'
line 1:15 token recognition error at: '$"'
line 1:10 mismatched input '"' expecting {JsonPointer, EscapedJsonPointer, String}
2024-05-07T12:07:25,042 [log-pipeline-processor-worker-1-thread-1] ERROR org.opensearch.dataprepper.plugins.processor.parse.AbstractParseProcessor - An exception occurred while using the parse_json processor on Event [org.opensearch.dataprepper.model.log.JacksonLog@2c0f1eaf]
org.opensearch.dataprepper.expression.ExpressionEvaluationException: Unable to evaluate statement "/message=~"^\w*$""
  1. If you escape the dollar sign, you still get an error:
line 1:11 token recognition error at: '^'
line 1:12 token recognition error at: '\'
line 1:13 token recognition error at: 'w*'
line 1:15 token recognition error at: '\'
line 1:16 token recognition error at: '$"'
line 1:10 mismatched input '"' expecting {JsonPointer, EscapedJsonPointer, String}
2024-05-07T12:16:50,189 [log-pipeline-processor-worker-1-thread-1] ERROR org.opensearch.dataprepper.plugins.processor.parse.AbstractParseProcessor - An exception occurred while using the parse_json processor on Event [org.opensearch.dataprepper.model.log.JacksonLog@686b21ea]
org.opensearch.dataprepper.expression.ExpressionEvaluationException: Unable to evaluate statement "/message=~"^\w*\$""
	at org.opensearch.dataprepper.expression.GenericExpressionEvaluator.evaluate(GenericExpressionEvaluator.java:42) ~[data-prepper-expression-2.7.0.jar:?]
	at org.opensearch.dataprepper.expression.ExpressionEvaluator.evaluateConditional(ExpressionEvaluator.java:28) ~[data-prepper-api-2.7.0.jar:?]
	at org.opensearch.dataprepper.plugins.processor.parse.AbstractParseProcessor.doExecute(AbstractParseProcessor.java:70) ~[parse-json-processor-2.7.0.jar:?]
	at org.opensearch.dataprepper.model.processor.AbstractProcessor.lambda$execute$0(AbstractProcessor.java:54) ~[data-prepper-api-2.7.0.jar:?]
	at io.micrometer.core.instrument.composite.CompositeTimer.record(CompositeTimer.java:69) [micrometer-core-1.11.5.jar:1.11.5]
	at org.opensearch.dataprepper.model.processor.AbstractProcessor.execute(AbstractProcessor.java:54) [data-prepper-api-2.7.0.jar:?]
	at org.opensearch.dataprepper.pipeline.ProcessWorker.doRun(ProcessWorker.java:135) [data-prepper-core-2.7.0.jar:?]
	at org.opensearch.dataprepper.pipeline.ProcessWorker.run(ProcessWorker.java:61) [data-prepper-core-2.7.0.jar:?]
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
Caused by: org.opensearch.dataprepper.expression.ParseTreeCompositeException
	at org.opensearch.dataprepper.expression.ParseTreeParser.createParseTree(ParseTreeParser.java:78) ~[data-prepper-expression-2.7.0.jar:?]
	at org.opensearch.dataprepper.expression.ParseTreeParser.parse(ParseTreeParser.java:101) ~[data-prepper-expression-2.7.0.jar:?]
	at org.opensearch.dataprepper.expression.ParseTreeParser.parse(ParseTreeParser.java:27) ~[data-prepper-expression-2.7.0.jar:?]
	at org.opensearch.dataprepper.expression.MultiThreadParser.parse(MultiThreadParser.java:35) ~[data-prepper-expression-2.7.0.jar:?]
	at org.opensearch.dataprepper.expression.MultiThreadParser.parse(MultiThreadParser.java:20) ~[data-prepper-expression-2.7.0.jar:?]
	at org.opensearch.dataprepper.expression.GenericExpressionEvaluator.evaluate(GenericExpressionEvaluator.java:38) ~[data-prepper-expression-2.7.0.jar:?]
	... 12 more
Caused by: org.opensearch.dataprepper.expression.ExceptionOverview: Multiple exceptions (6)
|-- org.antlr.v4.runtime.LexerNoViableAltException: null
    at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
|-- org.antlr.v4.runtime.LexerNoViableAltException: null
    at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
|-- org.antlr.v4.runtime.LexerNoViableAltException: null
    at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
|-- org.antlr.v4.runtime.LexerNoViableAltException: null
    at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
|-- org.antlr.v4.runtime.LexerNoViableAltException: null
    at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
|-- org.antlr.v4.runtime.InputMismatchException: null
    at org.antlr.v4.runtime.DefaultErrorStrategy.sync(DefaultErrorStrategy.java:270)
line 1:11 token recognition error at: '^'
line 1:12 token recognition error at: '\'
line 1:13 token recognition error at: 'w*'
line 1:15 token recognition error at: '\'
line 1:16 token recognition error at: '$"'
line 1:10 mismatched input '"' expecting {JsonPointer, EscapedJsonPointer, String}
2024-05-07T12:16:50,191 [log-pipeline-processor-worker-1-thread-1] ERROR org.opensearch.dataprepper.plugins.processor.parse.AbstractParseProcessor - An exception occurred while using the parse_json processor on Event [org.opensearch.dataprepper.model.log.JacksonLog@1c5deeb4]
org.opensearch.dataprepper.expression.ExpressionEvaluationException: Unable to evaluate statement "/message=~"^\w*\$""
  1. Parsing also fails when checking if message is a JSON string or array with the following regex:

parse_when: '/message =~ "^(\{.\}|\[.\])$"'

Expected behavior
Regex should be parsed correctly.

Environment (please complete the following information):

  • Data Prepper 2.7.

Additional context
Add any other context about the problem here.

Looks like a bug in expression grammar.

Additional notes to the case:

  1. If I add a regular expression without { } $ (or some other special characters), it works perfectly fine. Our example was tested on a different parser and works there. As mentioned above, even the example from your documentation ("^\w*$") fails the test because of the dollar sign.
  2. This use case is quite important for us, because it helps to distinguish log messages with JSON structure from any other (syslog) messages, avoid parser errors and improve overall performance. Also, such behaviour is a standard feature in Graylog.