deis / logger

In-memory log buffer used by Deis Workflow.

Home Page:https://deis.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Proposal: Use Redis for storing logs

krancour opened this issue · comments

TL;DR

I'm proposing that we replace logger with Redis.

First thing's first

I cannot take all the credit for this proposal. @jchauncey and @arschles were also involved. I'll let them each take as much or as little credit as they like, depending on how this is received. 😉

Note that this is proposal is a partial pivot from this proposal, which suggests using nsq as a messaging system for delivery of metrics and logs to their respective, ultimate destinations. In the case of logs, that ultimate destination is the logger component, which holds logs for each Workflow-deployed application in memory. Insofar as this proposal is a partial pivot from that other proposal, this proposal outline an alternative architecture for logging only. The other proposal still stands with respect to metrics collection and monitoring.

With that out of the way...

Some background

Logger used to do more than it does today. At one time it could do all of:

  1. Store logs on the file system (volumes backed by ceph)
  2. Store logs in memory
  3. Drain (forward) logs to elsewhere for long-term archiving (e.g. Papertrail or Splunk)
  4. Serve logs to the controller over HTTP

Of these four capabilities, half have been eliminated in the current implementation. Concern 1 was simply eliminated. Logs are only stored in memory now and users are instructed to drain logs to elsewhere if they are interested in keeping them for any length of time. Concern 3 (the aforementioned log draining) is now handled directly by fluentd (the log collection agents on each host).

The remaining capabilities are storing logs in memory (a ringbuffer per Workflow-deployed application) and serving those logs to the controller via HTTP.

Q: But why do we need a custom component to do this?

A: We probably don't.

Can we use Redis instead?

Redis is a fast and easy-to-use option for writing to and reading from an in-memory datastore via HTTP. In other words, it already implements the two concerns the the logger component does.

Replacing logger with Redis would have the following advantages:

  1. Eliminating logger means we have one fewer custom component to maintain.
  2. Redis is fast and has its very own benchmarking tool, so we can easily quantify its performance.
  3. Redis is easy to work with.
  4. Redis has a very large and active community.
  5. Redis, as a product, has received a much higher degree of polish than our logger component.

How it would work

Currently, @jchauncey has been working on a fluentd plugin for writing logs to nsq (per this proposal). That particular effort could pivot, with the work repurposed and refactored to write logs to Redis.

Currently, controller retrieves logs from logger via HTTP. Controller could easily be refactored to retrieve logs directly from Redis (and delete logs from Redis when an application is destroyed).

Data structures

Currently in logger, logs for each Workflow-deployed application are stored in their own ring buffer. For anyone unfamiliar, a ring buffer is like a queue that is eating itself. When the data structure is filled to maximum capacity (which is configurable), addition of a new log entry is accompanied by removal of the oldest log entry.

No such data structure as a ring buffer exists in Redis. In fact, nor does any such data structure as a queue. However, Redis does support lists and exposes operations such as the following:

  • lpush (left / head push)
  • lpop (left / head pop)
  • rpush (right / tail push)
  • rpop (right / tail pop)

Using the operations above, a Redis client can impose FIFO or LIFO semantics on a list to implement queues or stacks... or ring buffers.

To implement a ring buffer per Workflow-deployed app:

  1. rpush <app> <message> the new message (rpushing keeps the newest message at the tail and the oldest message at the head of the list so that when the list is retrieved, logs appear in the correct order).
  2. If llen <app> > max messages, then lpop <app> <message>

Sizing

I would estimate the effort, judged relative to other in-progress work, to be "medium."

tl;dr - Redis could work, but only, in my opinion, as a ring-buffer that is written to in a serialized or pseudo serialized fashion, and preferably not accessed directly by the controller or fluentd.

I had to write a lot below because it seems like there are multiple changes listed in this proposal. I've listed the changes below as I understand them, each in a separate section. Below the section header, I've provided my opinions in as much detail as possible without writing an essay.

Replacing Logger's Ring Buffer

I get the reasoning for replacing logger's in-memory ring buffer with redis and I'm cool with it from a technical perspective.

However, from both an architectural and planning perspective, I don't see much of a difference in short-term build or long-term maintenance work between the following:

  1. Building and maintaining our own in-memory ring buffer and communication mechanisms (in logger)
  2. Building and maintaining a component that can run Redis, and writing the code that can implement ring buffers (in fluentd as proposed) on top of Redis's data structures

Writing From Fluentd to Redis

In the proposed architecture, Fluentd will write directly to Redis. If someone runs a medium or large scale app, there will be many Fluentd processes writing to one or a few redis nodes (since an app's logs will need to live on a single circular buffer, we can run up to one Redis instance per app. Note that we could further shard individual app logs, but that would require more complex log ordering semantics).

The large fan-in will eventually result in high connection counts to the redis server(s) and/or high contention on the data structures. The addition of middle tier aggregators generally solves this problem. To do so, we can use the nsq component that we already are adding to facilitate metrics collection. We can pass log messages through nsq (on a dedicated topic), consume them, and batch-insert them to the appropriate ring buffers. Using that architecture will achieve a few things:

  1. Reduce contention on the redis node(s)
  2. Reduce incoming connections to the redis node(s)
  3. Allow us to handle log production throughput (from the app pods) independently from log insertion throughput (from the middle tier)

I'm not specifying a specific middle tier component on purpose. The first implementation can be a simple multiplexer (consume from nsq, decide which redis node to write to, then insert into the appropriate ring buffer), and we can probably extend fluentd for that purpose. The important change here is that architecturally we will have split log production and log recording.

Controller Reading From Redis

I believe that using redis as a log store is a good idea, in addition to my point above, because we can get a historical window of logs to bring the end-user "up to date", and then start streaming current logs as they come in. There are some drawbacks to the specific approach presented, however.

First, if the controller reads directly from redis, as suggested, we'll be increasing contention on the applicable redis server. We'll also be increasing the complexity of the controller (it would need to understand how to read from redis circular buffers in a concurrency-safe manner). We'll also be adding another service dependency to the controller, which violates our principle of making our components as self-sufficient as possible.

Second, if we want to stream logs to the end-user, we'll need to atomically switch incoming logs from an old ring buffer to a new, stream old logs to the user while pushing new logs onto a broadcast queue, and then stream the contents of the end-user while deleting streamed items. Using a single in-memory buffer, this entire process is relatively easy to do, and while doing so in redis is possible, implementing the logic to do so in the controller ventures significantly outside the scope of the API control plane that it currently is.

tl;dr

You raise some good conerns, but I don't think this change would leave us any worse than we are right now, and think it would provide a more solid foundation to build on.

Replacing Logger's Ring Buffer

From an architectural perspective, I don't see much of a difference in long-term maintenance work between the following:

  1. Building and maintaining our own ring buffer (in logger)
  2. Building and maintaining a component that can run redis

I would humbly suggest that this comparison is not accurate. Maintaining the code base for a custom log aggregator such as logger, written from scratch, such as it is, is a larger on-going burden than the comparatively small "component that can run redis," which requires little more than a Dockerfile and a Makefile. But the comparison also isn't taking into account the large volume of development, testing, and community participation that have gone into making Redis the popular, robust, and reliable product that it is. Compared to the logger, I believe Redis gives us more for less.

Also requires writing the code, in fluentd as proposed, to implement ring buffer semantics (i.e. list pruning logic), which adds complexity

This particular piece of the puzzle is actually quite easy. The work that @jchauncey has done to write to nsq via a custom fluentd plugin can be adapted to write to Redis with minimal effort. Implementing ring buffer semantics is not hard either. It's literally just a push and a conditional pop.

Writing From Fluentd to Redis

If someone runs a medium or large scale app, there will be many Fluentd processes writing to one or a few redis nodes

The large fan-in will eventually result in high connection counts to the redis server(s)

Unless I am mistaken, the number of fluent processes is equal not to the number of application instances, but to the number of k8s worker nodes. This is generally a modest number. Of course, we do want to be able to scale so that Workflow remains viable on very large clusters. So, fwiw, Redis supports 10,000 connections by default (unless overridden or unless the underlying availability of file descriptors is insufficient).

and/or high contention on the data structures.

Redis is single-threaded with all requests served on a first-come-first-serve basis, so contention on individual data structures isn't an issue so much as the Redis server's ability to keep up with the volume of incoming requests. Its ability to keep up can easily be benchmarked using redis-benchmark.

I will readily concede that if a cluster and the apps within grow to a point where a single Redis instance cannot keep up, scaling Redis isn't the easiest thing in the world, but with a little outside-the-box thinking, there's an easy option (for this particular use case). Instead of operating a single Redis cluster that handles logs for all applications, we could provision a dedicated Redis instance per application. This is relatively easy. Fluentd would just write each log to the appropriate app-specific Redis and controller would read from the appropriate app-specific Redis.

Controller Reading From Redis

First, if the controller reads directly from redis, as suggested, we'll be increasing contention on the applicable redis server.

As stated, contention isn't a concern so much as a simple matter of whether the Redis server can keep up with request volume. The controller would be contributing marginal request volume in comparison to the fluentd instances.

We'll also be increasing the complexity of the controller (it would need to understand how to read from redis circular buffers in a concurrency-safe manner).

I believe this is may be another overestimation of the complexity. While a fluentd plugin might enforce ring buffer-like semantics on a list, there is nothing special about the list itself, and all the controller needs to do is read and return the list.

We'll also be adding another service dependency to the controller.

We'd also be removing one (logger).

Second, if we want to stream logs to the end-user, we'll need to atomically switch incoming logs from an old ring buffer to a new, stream old logs to the user while enqueueing new logs, and then stream that queue to the user while deleting streamed items. Using a single in-memory buffer, this is relatively easy to do, and while doing so in redis is possible (with a bit more complexity), implementing the logic to do so in the controller ventures significantly outside the scope of the API control plane that it currently is.

Admittedly, you've lost me here. I'm not really thinking about streaming logs (yet).

My updated thoughts (far from the final word)

I think we'd do no worse than we're doing today by swapping logger for Redis as proposed. In fact, I'd argue we'd be doing quite a bit better in terms of simplicity and reduced maintenance burden. That being said, I think some of the concerns over how this will scale are fair. But at least we can benchmark Redis' performance and get real data on that where the breaking point is. (This would be harder to do with logger.) If we're not comfortable with where that breaking point is, some of the suggestions you've offered above can be applied to incrementally improve our scalability. There could be other options as well, like giving each application its own dedicated Redis instance, as mentioned above.

Ultimately, I feel like swapping logger for Redis is a good starting point and that we can go from there.

... Compared to the logger, I believe Redis gives us more for less

Agree

This particular piece of the puzzle is actually quite easy. The work that @jchauncey has done to write to nsq via a custom fluentd plugin can be adapted to write to Redis with minimal effort. Implementing ring buffer semantics is not hard either. It's literally just a push and a conditional pop.

It's not as easy as pushing onto nsq. The push and conditional pop represent 3 (maybe 2) RPCs, which in a distributed environment is logically not as easy to reason about and technically not as easy to build, test and manage long term.

Unless I am mistaken, the number of fluent processes is equal not to the number of application instances, but to the number of k8s worker nodes. This is generally a modest number. Of course, we do want to be able to scale so that Workflow remains viable on very large clusters. So, fwiw, Redis supports 10,000 connections by default (unless overridden or unless the underlying availability of file descriptors is insufficient).

Good point, I forgot that it was run as a daemon set, not inside each pod.

Redis is single-threaded with all requests served on a first-come-first-serve basis, so contention on individual data structures isn't an issue so much as the Redis server's ability to keep up with the volume of incoming requests. Its ability to keep up can easily be benchmarked using redis-benchmark.

Contention is still an issue. As proposed, concurrent data structures will be accessed by fluentd (2 or 3 RPCs per operation) and the controller (N RPCs per operation, where N is the size of the circular buffer). Those operations either need to be lockless (which in practice means very carefully thought out to minimize or eliminate the possibility of "bad" race conditions) or use Redis's MULTI primitive. The former strategy will inevitably let a bad race condition case slip through and the latter is what would cause the contention.

I will readily concede that if a cluster and the apps within grow to a point where a single Redis instance cannot keep up, scaling Redis isn't the easiest thing in the world, but with a little outside-the-box thinking, there's an easy option (for this particular use case). Instead of operating a single Redis cluster that handles logs for all applications, we could provision a dedicated Redis instance per application. This is relatively easy. Fluentd would just write each log to the appropriate app-specific Redis and controller would read from the appropriate app-specific Redis.

Agree RE: sharding redis. I'm not concerned about data volume either, nor raw request volume. The concurrent, compound operations are concerning to me.

We'd also be removing one (logger).

Good point. Compared to the simple interface we would expose from the logger (or similar component), Redis has a complicated interface.

Admittedly, you've lost me here. I'm not really thinking about streaming logs (yet).

I think we have to, since a large goal of ours in evolving our logging/metrics transport & collection architecture is to treat applicable data as streams.

I'd argue we'd be doing quite a bit better in terms of simplicity and reduced maintenance burden

The architecture would be simpler, and we'd have less code to maintain, but I believe that we'd be shipping a sub-par logging architecture if we write directly to and read directly from redis, for reasons I've stated above.

But at least we can benchmark Redis' performance and get real data on that where the breaking point is. (This would be harder to do with logger.)

Why is it harder with logger + nsq? Also, comparing the perf of Redis is not the same as comparing the performance of the entire system, which is Fluentd, Redis and the controller. I believe we'd be more interested in the latter

the suggestions you've offered above can be applied to incrementally improve our scalability

Absolutely right for most of my above suggestions. The only one that I think we should pursue now is to put a simple API between redis and the controller. Other pieces can be implemented internally (to the logging/metrics system) as needed.

Ultimately, I feel like swapping logger for Redis is a good starting point and that we can go from there

Agree with your proposal of swapping logger for Redis, but the original proposal seems to also be for removing nsq from the logging system as well, and that is my 2nd biggest concern (behind the controller <--> redis interface)

Rather than tackle your latest feedback (which I appreciate, by the way) excerpt by excerpt, I will break it down into a couple general themes that I think I hear you expressing.

Contention

I agree that we would have multiple fluentd daemons and the controller all vying for access to a common set of objects (lists) stored in Redis. There's no question that is the case, however, as you've pointed out the use of transactions can mitigate that. I was already anticipating that would be necessary. I am sorry I did not make that clear sooner.

Where I think we're diverging here is on what the actual consequence of using transactions might be. It would seem you believe that using transactions will increase contention insofar as a transaction will (necessarily) lock the objects in question. I would absolutely agree with that assessment if it weren't for the fact that Redis is single-threaded to begin with. If you're using MULTI to atomically group together a series or two or three operations, it wouldn't matter if any of those objects were locked, because with Redis serving only one request at a time, any request waiting for access to any of those objects is waiting already anyway by virtue of the fact that Redis' only thread is working on another request.

Essentially, I do not believe there is any performance penalty for using transactions. I could be wrong about that, but if I am right, I believe two things follow:

  1. There would be no reason not to use transactions.
  2. With transactions, as without them, the bottleneck isn't a lock on any object, but rather the one thread. So the only question at that point is one of whether that one thread can keep up with the request volume. It sounds like we agree that if it cannot, then sharding would be a good way to mitigate that.

Controller

There seems to be a concern that from the controller's perspective, talking to Redis is more complicated than talking to the logger. I have to disagree with that. Again, I could be wrong, but I feel that far and wise, Redis is regarded as simple. If we want to compare or contrast the complexity of Redis' interface to the logger's we should zero in on the narrow subset of Redis' interface that the controller would use. currently, logger exposes a RESTful interface by which controller fetches a specified number of log entries for a given app. Doing so requires the controller two know two things: a URL and a parameter (number of lines to fetch). Fetching the same information from Redis would also require only two pieces of information: the object (list) name and a parameter (number of lines to fetch). Ultimately, there's hardly any difference between these two. The URL in the case of the RESTful interface and the object name in the case of the Redis interface are both the same thing-- they are resource identifiers.

NSQ and general disagreement re: overall architecture

Forgetting this proposal for a moment. It's not quite clear to me how we came to the conclusion that NSQ is an improvement over what we previously had (a queue in memory within the logger process itself). If it seems I am deliberately excluding NSQ from my proposal, it's because I'm not yet convinced that adding it bought us anything to begin with.

If I offer you the benefit of the doubt and say you're right about Redis contention issues, it seems you would have logger not go away, but would have it pull from NSQ as @jchauncey had been planning and write those messages to Redis. (Instead of having fluentd write directly to Redis.) What I still have to ask is what benefit NSQ buys us in that case that couldn't have been achieved without it. How is it better than the in-memory queue logger used previously?

My second question would be how this middle tier NSQ --> Redis adapter would reduce the contention that you think will be an issue. It wouldn't reduce request volume, so it's little different than having Fluentd write directly to Redis... unless you batch the inserts, and I do know you had mentioned that at some point. Assuming that is the answer, I would question both its benefit and practicality. If inserts are done in batches large enough to make any sort of difference two things will happen: 1. requests will take longer to complete, which may negate the benefit of a lower request volume and 2. logs will not be quite as near to real time as they are now. Imagine an application that produces a very low volume of logs-- let's say hypothetically one per minute and, again hypothetically, that we're writing in batches of 50. It could be 50 minutes before a log written by the app now would be visible via deis logs.

If I can succinctly express the overall theme of my latest round of comments, it is a general worry about over-engineering to solve for problems we're not certain exist.

Quick update on this... #88 is where we landed on this. I'm going to re-assign this to 2.2.

@krancour now that #88 is merged, is this complete?

Yes. We can close this. To put closure on the thread, we did end up incorporating Redis as the log store, but didn't eliminate logger itself from the architecture as originally proposed by this issue. @arschles, @helgi and others had some good arguments for keeping logger around, so it stayed.