cloudfoundry / loggregator-release

Cloud Native Logging

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Randomly breaking messages under load

giner opened this issue · comments

  • I found a bug - here are some steps to recreate it.

Versions of loggregator prior to 101.4 (tested versions 99, 100, 101 and 101.3) are randomly breaking messages under load.

Here is how to reproduce:

mkdir -p /tmp/testlog && cd /tmp/testlog && touch .testlog & cf push testlog -u process -c 'data=$(seq -s " " 1 20000); while true; do echo "$data"; done' -b https://github.com/cloudfoundry/binary-buildpack

# Wait for a few minutes and check doppler logs, you are likely to see this:
...
2018/11/21 01:03:24 Received bad envelope: proto: wrong wireType = 1 for field SourceInstance
2018/11/21 01:06:24 Received bad envelope: proto: wrong wireType = 0 for field SourceInstance
2018/11/21 01:07:24 Received bad envelope: proto: wrong wireType = 6 for field SourceInstance
2018/11/21 01:07:37 Received bad envelope: proto: wrong wireType = 7 for field SourceInstance
2018/11/21 01:07:37 Received bad envelope: proto: wrong wireType = 1 for field SourceInstance
...

cf stop testlog

Notes:

  • We cannot reproduce the issue with loggregator 101.4
  • We could not reproduce the issue on bosh-lite

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/162144887

The labels on this github issue will be updated when the story is started.

One more piece: upgrading only metron without upgrading doppler to the version from loggregator-release v101.4 also solves the issue.

Hello, thanks for the bug report.

Do you have a use case for a Loggregator release older than 101.4? If not, we'd suggest you upgrade.

Closing for now, since this seems to be fixed in newer versions. @giner Please let us know if you have any concerns or problems around updating to a newer version of loggregator.

Thanks for reporting this, though. It is good to know and I will put add a line to the release notes for the affected version.

@jtuchscherer, we will upgrade to the fixed version soon. There seem to be no real blockers. Though another problem (#387) was introduced in the newer releases but we will reduce the chance of this to happen by lowering number of lines buffered in doppler.

@giner But even that issue should be fixed in a more recent version. I hope you would be able to upgrade to at least version 103.2. If not, please let us know. We would be happy to work with you on that.

@jtuchscherer if I remember correctly the issue with recent logs was fixed after (or with) introduction of logcache. I'm not sure we can safely upgrade to that one yet. Though it's lesser of an issue, I think we can live with it for a while until we can upgrade.