%UNIXPATH can cause RegEx DOS
jsvd opened this issue · comments
moved from elastic/logstash#5458
Repro is very simple. This will peg your CPU:
input {
stdin {
add_field => { "test" => "/web/asdf/my_input_file_000223.xml name" }
}
}
filter {
grok {
match => {
"test" => [
"%{UNIXPATH}/asdf/my_%{DATA}_file_%{DATA:fileName} name"
]
}
}
}
output {
stdout {
codec => rubydebug {
metadata => true
}
}
}
The more /asdf/asdf
you add, the longer it takes.
It works without delay on this online grok debugger app:
Investigating, it was found that the UNIXPATH is different between tools:
Online App: UNIXPATH (?>/(?>[\w_%!$@:.,-]+|\\.)*)+
(Works Better)
Logstash: UNIXPATH (/([\w_%!$@:.,~-]+|\\.)*)+
(Can Blow Up)
This happen because the current UNIXPATH expression is getting "confused" by actual input data, with:
expression: UNIXPATH (/([\w_%!$@:.,+~-]+|\\.)*)+
input data: /web/asdf/my_input_file_000223.xml name
grok: %{UNIXPATH}/asdf/my_%{DATA}_file_%{DATA:fileName} name
the regexp is trying to match as far as it can, non understanding the first '/' as a new non related token as is the intention for it. We can see this is true by doing a simply change:
input data: "/web /asdf/my_input_file_000223.xml name"
grok: %{UNIXPATH} /asdf/my_%{DATA}_file_%{DATA:fileName} name
so the current expression knows when it has to finish consuming characters.
possible solution to this would be to add end characters as they could appear in the expressions, but more debugging is required to know more.
interesting fact, from the current pattern rspec tests:
- will generate a timeout error, as with the previous example.
context "when using recursive paths" do
let(:pattern) { "%{UNIXPATH}/bar/%{DATA:fileName}" }
let(:value) { "/foo/bar/my_input_file_000223.xml" }
it "should match the path expression" do
expect(grok_complex_match(pattern,value)).to pass
end
end
- will not generate a timeout error.
context "when using recursive paths" do
let(:pattern) { "%{UNIXPATH}/bar/%{DATA:fileName}" }
let(:value) { "/foo/bar/my_input_file_foo.xml" }
it "should match the path expression" do
expect(grok_complex_match(pattern,value)).to pass
end
end
see that only difference here is the change in the input, removing the numbers in the filename for letters. This situation is also reproducible using the master branch.
Relabeled to P3 because this only affects UNIXPATH and workarounds (don't use UNIXPATH) are available.