System Log messages should include fraction of second
youngm opened this issue · comments
- I have an idea for a new feature - please document as "As a user, I would
like to..."
As an operator of a CF deployment I would like system log messages logged by loggregator components to have fractions of seconds.
My log aggregation tools prefer log message timestamp over syslog timestamp because log message timestamp is typically more accurate to the millisecond. However, loggregator components don't log fraction of a second which disrupts this model when comparing loggregator component logs against other cf components.
Log message timestamp example today: 2018/05/17 16:00:19
.
We have created an issue in Pivotal Tracker to manage this:
https://www.pivotaltracker.com/story/show/157668501
The labels on this github issue will be updated when the story is started.
@JohannaSmith I'm talking about component logs that get created in /var/vcap/sys/log
and pickup with the syslog-release for system administrators to use to debug potential issues with loggregator components.
@youngm Ah. Looking at the logs that do emit partial seconds, it's a very different approach to logging. Those bosh logs have gone with a structured log approach which enables this. For loggregator, we found that relying heavily on structured logs can cause the following:
- high throughput on the system due to more reliance on logs
- less readability
We aren't planning on adding fractional seconds to our current log format.
We've tried to transition to heavy reliance on metrics. Is there a missing metric you could have used in this scenario that we could add?
@JohannaSmith The biggest issue for me is attempting to correlate cross component events or errors when debugging issues. Doing so is much easier with fractions of a second in log messages.
For example, say I'm attempting to diagnose an intermittent issue with a syslog drain. My drain server may be logging events in fractions of a second. I'd like to see if a particular error on my drain matches some kind of error on the adapter at the same time potentially helping me discover the problem. If logging at second granularity it makes it harder to correlate issues between components.
I'm not asking for structured log messages. Just more granular time signatures when loggregator does decide to log one of its unstructured messages.
Looks like ‘log.Lmicroseconds’ could be tacked on to the log pkg flags:
https://golang.org/pkg/log/#pkg-constants
It doesn’t add much noise to the output and is still human readable vs other bosh components that just use unix timestamp.
I should probably quit while I'm ahead. But, if it isn't much more trouble a nice ISO 8601 format is easier for splunk and other log aggregators to parse. For example: 2018-05-31T15:14:42.339Z
But, this would just be icing on the cake. :) I'd also be perfectly happy to just have sub seconds in the current format. Thanks @jasonkeene and @JohannaSmith
Yeah, I don't see support for ISO 8601 in the log pkg constants. I think sub seconds is a happy middle-ground. Like @JohannaSmith said if there are any metrics we can export that would help you in troubleshooting your issue please post them. We want to encourage folks not to rely on logs for debugging.
@youngm We're working on this now. Can you tell us more specifically which component logs you're referring to? Are you saying that the problem exists before the syslog-release
picks up the logs or after?
@toddboom I'm looking for subseconds in the actual log message before being picked up by the syslog-release
It seems pretty much all of the components produced by the logging and metrics team has this issue. Here are the ones I use most interested in having changed.
- Metron
- Doppler
- Traffic Controller
- Reverse Log Proxy
- Adapter
- Scheduler
@youngm Thanks! That looks like the list I was putting together, but I just wanted to make sure we were on the same page. I'll get cracking on those and follow up here once it's done.
@toddboom We don't yet use log cache but I'm sure we will so don't forget about that one. :)
Ok, these commits should take care of it in pretty much everything I can think of:
- cloudfoundry-attic/scalable-syslog@3ff211a
- cloudfoundry/loggregator@b5b9368
- cloudfoundry/log-cache@f125835
- cloudfoundry/loggregator-agent@e5dffbd
They should be included in the next releases of each product.
Looks great! Thanks @toddboom!