cloudfoundry / loggregator-release

Cloud Native Logging

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Consuming logs via the V2 Firehose will show reduced performance

pianohacker opened this issue · comments

The Loggregator Firehose, which exposes application logs and application or component metrics, is accessible through a few different APIs:

  • The V1 Firehose API (wss://doppler.SYSTEM_DOMAIN), which is provided by the Traffic Controller. Historically, this has been used by most integrations.
  • The internal V2 Firehose API (over gRPC), which is provided by the Reverse Log Proxy (RLP). This has been present since v79, and is used by many internal components such as Log Cache, Healthwatch and Syslog Adapters.
  • The external V2 Firehose API (https://log-stream.SYSTEM_DOMAIN), which is provided by the RLP Gateway. This was added in v103.1 and some integrations have switched to it.

You may experience the following issues with integrations that use the external V2 Firehose API:

  • Overall throughput is lower - considerably fewer logs/metrics per second are able to reach their destination with the same amount of system resources.
  • High CPU and memory usage on the nozzles/firehose consumers and the log-api VMs.
  • Loss of logs when clients disconnect. This occurs, at minimum, every 14 minutes as the RLP Gateway refreshes connections. This loss will be exacerbated by any additional load balancers in front of the foundation's Gorouter.

The first two issues are due to the conversion to and from JSON involved in logs/metrics transport through the RLP Gateway.

Resolution

At this time, there is not a direct solution to these throughput and resource usage concerns. We recommend that nozzles/consumers either:

  1. Return to using the V1 Firehose API. This API was previously planned to be deprecated, but will now be available in all versions of cf-deployment going forward.
  2. Use the internal V2 Firehose API via gRPC, as per the following example: firehose-nozzle-v2/rlp/:
    • Ensuring that the Application Security Groups in place on the platform allow applications to contact the log-api VMs.
    • Getting Mutual TLS credentials, rather than OAuth credentials. See this example for generating these for tile-deployed apps: firehose-nozzle-v2/rip-tile/:
    • Switching from the RLPGatewayClient to the EnvelopeStreamConnector

We have created an issue in Pivotal Tracker to manage this. Unfortunately, the Pivotal Tracker project is private so you may be unable to view the contents of the story.

The labels on this github issue will be updated when the story is started.