logstash-plugins / logstash-patterns-core

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

%UNIXPATH can cause RegEx DOS

jsvd opened this issue · comments

moved from elastic/logstash#5458

Repro is very simple. This will peg your CPU:

input {
        stdin {
                add_field => { "test" => "/web/asdf/my_input_file_000223.xml name"  }
        }
}


filter {
        grok {
                match => {
                        "test" => [
                                "%{UNIXPATH}/asdf/my_%{DATA}_file_%{DATA:fileName} name"
                        ]
                }
        }
}

output {
        stdout {
          codec  => rubydebug {
            metadata => true
          }
        }
}

The more /asdf/asdf you add, the longer it takes.

It works without delay on this online grok debugger app:

grokdebug

Investigating, it was found that the UNIXPATH is different between tools:

Online App: UNIXPATH (?>/(?>[\w_%!$@:.,-]+|\\.)*)+ (Works Better)
Logstash: UNIXPATH (/([\w_%!$@:.,~-]+|\\.)*)+ (Can Blow Up)

This happen because the current UNIXPATH expression is getting "confused" by actual input data, with:

expression: UNIXPATH (/([\w_%!$@:.,+~-]+|\\.)*)+
input data: /web/asdf/my_input_file_000223.xml name
grok: %{UNIXPATH}/asdf/my_%{DATA}_file_%{DATA:fileName} name

the regexp is trying to match as far as it can, non understanding the first '/' as a new non related token as is the intention for it. We can see this is true by doing a simply change:

input data: "/web /asdf/my_input_file_000223.xml name"
grok: %{UNIXPATH} /asdf/my_%{DATA}_file_%{DATA:fileName} name

so the current expression knows when it has to finish consuming characters.

possible solution to this would be to add end characters as they could appear in the expressions, but more debugging is required to know more.

interesting fact, from the current pattern rspec tests:

  1. will generate a timeout error, as with the previous example.
 context "when using recursive paths" do

    let(:pattern) { "%{UNIXPATH}/bar/%{DATA:fileName}" }
    let(:value)   { "/foo/bar/my_input_file_000223.xml" }

    it "should match the path expression" do
      expect(grok_complex_match(pattern,value)).to pass
    end
  end
  1. will not generate a timeout error.
  context "when using recursive paths" do

    let(:pattern) { "%{UNIXPATH}/bar/%{DATA:fileName}" }
    let(:value)   { "/foo/bar/my_input_file_foo.xml" }

    it "should match the path expression" do
      expect(grok_complex_match(pattern,value)).to pass
    end
  end

see that only difference here is the change in the input, removing the numbers in the filename for letters. This situation is also reproducible using the master branch.

Relabeled to P3 because this only affects UNIXPATH and workarounds (don't use UNIXPATH) are available.