HTTPDUSER pattern does not match for empty user for standard apache log, generates grokparsefailure
rsommer opened this issue · comments
Roland Sommer commented
Logstash information:
Using logstash 8.6.1, installed as debian package from official elastic-repo
JVM:
$ java --version
openjdk 11.0.18 2023-01-17
OpenJDK Runtime Environment (build 11.0.18+10-post-Debian-1deb11u1)
OpenJDK 64-Bit Server VM (build 11.0.18+10-post-Debian-1deb11u1, mixed mode, sharing)
OS version:
$ uname -a
Linux logstash 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux
Description of the problem including expected versus actual behavior:
The default grok pattern for HTTPDUSER (derived from USER) does not match ""
- which is a valid apache2 log output if the given remote user is empty (https://github.com/apache/httpd/blob/5c55d4c0600e7734030fa4d549913b4e94b2b0f2/modules/loggers/mod_log_config.c#L382)
Steps to reproduce:
- Configure an apache2 instance to use basic auth
curl -u :password --basic http://localhost:80/
- Example log output:
10.0.2.100 - "" [31/Jan/2023:07:59:58 +0000] "GET / HTTP/1.1" 401 381
Using the following simple config:
input { stdin { } }
filter {
grok {
match => { "message" => "%{COMMONAPACHELOG}" }
}
}
output {
stdout { codec => rubydebug }
}
leads to:
The stdin plugin is now waiting for input:
[2023-01-31T09:15:12,405][INFO ][logstash.agent ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
10.0.2.100 - "" [31/Jan/2023:07:59:58 +0000] "GET / HTTP/1.1" 401 381
{
"@version" => "1",
"@timestamp" => 2023-01-31T08:15:18.016253371Z,
"message" => "10.0.2.100 - \"\" [31/Jan/2023:07:59:58 +0000] \"GET / HTTP/1.1\" 401 381",
"event" => {
"original" => "10.0.2.100 - \"\" [31/Jan/2023:07:59:58 +0000] \"GET / HTTP/1.1\" 401 381"
},
"host" => {
"hostname" => "localhost"
},
"tags" => [
[0] "_grokparsefailure"
]
}
Adjusting the HTTPDUSER pattern to HTTPDUSER %{EMAILADDRESS}|%{USER}|""
allows parsing of this valid apache2 logline. Running with patched httpd pattern file:
The stdin plugin is now waiting for input:
[2023-01-31T09:30:51,040][INFO ][logstash.agent ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
10.0.2.100 - "" [31/Jan/2023:07:59:58 +0000] "GET / HTTP/1.1" 401 381
{
"@timestamp" => 2023-01-31T08:30:53.599445217Z,
"source" => {
"address" => "10.0.2.100"
},
"url" => {
"original" => "/"
},
"@version" => "1",
"http" => {
"request" => {
"method" => "GET"
},
"version" => "1.1",
"response" => {
"status_code" => 401,
"body" => {
"bytes" => 381
}
}
},
"message" => "10.0.2.100 - \"\" [31/Jan/2023:07:59:58 +0000] \"GET / HTTP/1.1\" 401 381",
"user" => {
"name" => "\"\""
},
"host" => {
"hostname" => "localhost"
},
"timestamp" => "31/Jan/2023:07:59:58 +0000",
"event" => {
"original" => "10.0.2.100 - \"\" [31/Jan/2023:07:59:58 +0000] \"GET / HTTP/1.1\" 401 381"
}
}