ankane / logstop

Keep personal data out of your logs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`URL_PASSWORD_REGEX` Is matching on values outside of a URL

afolta opened this issue · comments

commented

Hi,

The URL_PASSWORD_REGEX appears to be overly greedy and is matching on values that fall outside of the URL, causing user values to be unnecessarily filtered. See the below test string and the match that is causing the log to return a filtered user as:
{\"foo\":\"app_name\",\"bar\":\"./file.rb\",\"level\":\"error\",\"error\":\"HTTP POST request to: http://localhost:3000//url/50000091/call\",\"request-id\":\"e6ce7cb8-054d-415c-a194-45d5df583648\",\"user\":\"[FILTERED]@oreilly.test\",\"time\":\"2023-06-30T14:25:43Z\"}

Test String:
{\"foo\":\"app_name\",\"bar\":\"./file.rb\",\"level\":\"error\",\"error\":\"HTTP POST request to: http://localhost:3000//url/50000091/call\",\"request-id\":\"e6ce7cb8-054d-415c-a194-45d5df583648\",\"user\":\"joey.grady@oreilly.test\",\"time\":\"2023-06-30T14:25:43Z\"}

Match:

//localhost:3000//url/50000091/call\",\"request-id\":\"e6ce7cb8-054d-415c-a194-45d5df583648\",\"user\":\"joey.grady@

Screen Shot 2023-06-30 at 12 52 27 PM

Would it be possible to update the Regex to be a bit less greedy for url_passwords?

Hi @afolta, thanks for reporting! Pushed a fix for this specific pattern in the commit above.

(also, I'd recommend not using emails as user identifiers in logs)

commented

Hi @afolta, thanks for reporting! Pushed a fix for this specific pattern in the commit above.

(also, I'd recommend not using emails as user identifiers in logs)

Thanks for the quick fix @ankane ! I tested out the new Regex in Rubular and am still seeing an overly inclusive match. Am I missing something here?
Screen Shot 2023-06-30 at 2 16 28 PM

My bad, should be fixed now.

commented

My bad, should be fixed now.

Thanks a ton for this update! It's much appreciated.