logstash-plugins / logstash-patterns-core

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HTTPDUSER pattern does not match for empty user for standard apache log, generates grokparsefailure

rsommer opened this issue · comments

Logstash information:
Using logstash 8.6.1, installed as debian package from official elastic-repo

JVM:

$ java --version
openjdk 11.0.18 2023-01-17
OpenJDK Runtime Environment (build 11.0.18+10-post-Debian-1deb11u1)
OpenJDK 64-Bit Server VM (build 11.0.18+10-post-Debian-1deb11u1, mixed mode, sharing)

OS version:

$ uname -a
Linux logstash 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:
The default grok pattern for HTTPDUSER (derived from USER) does not match "" - which is a valid apache2 log output if the given remote user is empty (https://github.com/apache/httpd/blob/5c55d4c0600e7734030fa4d549913b4e94b2b0f2/modules/loggers/mod_log_config.c#L382)

Steps to reproduce:

  1. Configure an apache2 instance to use basic auth
  2. curl -u :password --basic http://localhost:80/
  3. Example log output: 10.0.2.100 - "" [31/Jan/2023:07:59:58 +0000] "GET / HTTP/1.1" 401 381

Using the following simple config:

input { stdin { } }

filter {
  grok {
    match => { "message" => "%{COMMONAPACHELOG}" }
  }
}

output {
  stdout { codec => rubydebug }
}

leads to:

The stdin plugin is now waiting for input:
[2023-01-31T09:15:12,405][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
10.0.2.100 - "" [31/Jan/2023:07:59:58 +0000] "GET / HTTP/1.1" 401 381
{
      "@version" => "1",
    "@timestamp" => 2023-01-31T08:15:18.016253371Z,
       "message" => "10.0.2.100 - \"\" [31/Jan/2023:07:59:58 +0000] \"GET / HTTP/1.1\" 401 381",
         "event" => {
        "original" => "10.0.2.100 - \"\" [31/Jan/2023:07:59:58 +0000] \"GET / HTTP/1.1\" 401 381"
    },
          "host" => {
        "hostname" => "localhost"
    },
          "tags" => [
        [0] "_grokparsefailure"
    ]
}

Adjusting the HTTPDUSER pattern to HTTPDUSER %{EMAILADDRESS}|%{USER}|"" allows parsing of this valid apache2 logline. Running with patched httpd pattern file:

The stdin plugin is now waiting for input:
[2023-01-31T09:30:51,040][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
10.0.2.100 - "" [31/Jan/2023:07:59:58 +0000] "GET / HTTP/1.1" 401 381
{
    "@timestamp" => 2023-01-31T08:30:53.599445217Z,
        "source" => {
        "address" => "10.0.2.100"
    },
           "url" => {
        "original" => "/"
    },
      "@version" => "1",
          "http" => {
         "request" => {
            "method" => "GET"
        },
         "version" => "1.1",
        "response" => {
            "status_code" => 401,
                   "body" => {
                "bytes" => 381
            }
        }
    },
       "message" => "10.0.2.100 - \"\" [31/Jan/2023:07:59:58 +0000] \"GET / HTTP/1.1\" 401 381",
          "user" => {
        "name" => "\"\""
    },
          "host" => {
        "hostname" => "localhost"
    },
     "timestamp" => "31/Jan/2023:07:59:58 +0000",
         "event" => {
        "original" => "10.0.2.100 - \"\" [31/Jan/2023:07:59:58 +0000] \"GET / HTTP/1.1\" 401 381"
    }
}