valkey-io / valkey

A new project to resume development on the formerly open-source Redis project. We're calling it Valkey, since it's a twist on the key-value datastore.

Home Page:https://valkey.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[NEW] add a management-port

daniel-house opened this issue · comments

Problem:

Sometimes the primary workload of a Valkey server will become so large that it causes the latency of even trivial commands like PING to become unacceptable. When this happens the server is still healthy and functioning, but it fails health checks that come in on the client port (default 6379). This can then cause monitoring software such as Sentinel to cause failovers, which just makes the problem worse.

Proposed solution:

Add a new port dedicated to a low frequency stream of high urgency management commands. The PING is a good example. Other possibilities might be SHUTDOWN and BGSAVE. It would NOT be suitable for a replica-client, and is not intended for internal use for clustering, but it might be reasonable to use it for the CLUSTER administrative sub-commands.

Additional Benefit

This new management port could be assigned to a control-plane while the client port and cluster port remain on the data-plane (as described in Release It! by Michael T. Nygard).

Other solutions considered:

Prioritizing commands. This would require pre-parsing the beginning of each command so that it could be handled out-of-order.

So, the main problem we are trying to solve is that the Valkey engine should be "reliable" to health checks and management operations, and not be stalled by long running developer applications. An alternative port works well, we could even have it return a very subset subset of data in the RESP format from a separate thread so it is always reliable.

However, I think the ideal solution would be to make it so that the engine is practically always available to commands. We can do this in three parts:

  1. Make it so that health checks can always be served from long running commands. Basically, implement some way to "yield" out of a command to serve a very specific subset of commands. We started this work in Redis by allowing module yielding commands to disallow any commands from being executed.
  2. Add support to mark a client as "higher priority" on the event loop, so that it get's served with some regular cadence. There are a few strategies we could do to mark the clients (local connections, unix connections, based off of an admin flag like in #469). There are also ways to make the client higher priority, we could introduce a second event loop just to pool for those higher priority connections.
  3. Make sure that we don't spend too much time on anything else. This includes cleaning up deep command pipelines, and making sure anything like a flush is also periodically does this yielding.

IO Threads might also be a pathway here. We could have a separate thread start polling on the main event loop if a command has been running too long.

The benefit of prioritization, is that it requires no end user improvements and also helps solve the cluster-bus issues.

I agree yielding inside "long-commands" (probably need to consider all commands lack the CMD_FAST flag) is a good option. I think we will have to think what commands we will allow when yielding. For example it might not be good to start BGSAVE process while we yield since that would break save atomicity. I would suggest we start by introducing clients/users prioritization and then build a good yielding infrastructure allowing a minimal set of commands. I guess we can start by using the same mechanism used for lua timeout, where we use a minimized cron and handle some commands like script kill and shutdown nosave. Maybe we can work to incrementally extend the commands we allow?

guess we can start by using the same mechanism used for lua timeout

One problem with lua timeout is we throw errors to end users, let's not do that here.

Is there anything to read that would help me understand more about any existing thoughts on how to make long-running commands (e.g., KEYS *) yeild to some other commands (e.g., PING)? Requirements, designs, or PRs (rejected or paused)?

Is there anything to read that would help me understand more about any existing thoughts on how to make long-running commands (e.g., KEYS *) yeild to some other commands (e.g., PING)? Requirements, designs, or PRs (rejected or paused)?

Nothing in the open AFAIK.

The Async IO threads refactoring says "Read Operation: The IO thread will only read and parse a single command." This means that they will not "look ahead" to see if there is a PING that needs to be prioritized. This is one possible solution: add another thread and a work buffer so that we can de-queue commands, scan them to see if they are PING, then put them in the work buffer. I do not claim that this is the only way, or even a tolerable way, just "a" way. Can anyone suggest a better way, so that we can compare it to adding a management port?

Can we allow I/O threads to serve PING? Next we can allow them to serve cached CLUSTER SLOTS...

If a command like KEYS is yielding, we can only execute some subset of commands, in particular no write commands, so yielding sounds like a somewhat difficult way forward.

If a command like KEYS is yielding, we can only execute some subset of commands, in particular no write commands, so yielding sounds like a somewhat difficult way forward.

I would say no read commands either. I think the intention is just serve ping/pong messages for health. The point is not to make the core multi-threaded, just allow it to be responsive while processing a command.

There are two ways that PING could be used: health check and latency check. When doing a latency check you don't want PING to be given special priority. Is there any risk that someone is using PING this way?