mtrudel / bandit

Bandit is a pure Elixir HTTP server for Plug & WebSock applications

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error on redirect when using bandit with phoenix_live_view

jadengis opened this issue · comments

Hi! I am encountering an issue when I try to use the BanditAdapter with phoenix live view. In my application, I have a gen server called ApplicationContext that is started in a Registry in the following way:

DynamicSupervisor.start_child(
  ApplicationContext.Supervisor,
  %{
    id: {:via, Registry, {ApplicationContext.Registry, self()}},
    start:
      {GenServer, :start_link, [ApplicationContext.Server, init, [name: {:via, Registry, {ApplicationContext.Registry, self()}}]]},
    restart: :transient
  }
)

That is, only one instance ApplicationContext can be created for any given context, since self() is used in the process name. For a user's live view session, I start the application context server in a mount hook.

My issue is that if I use the redirect(socket, to: ~p"/some/path") pattern in a live view, I get the following error when attempting to start the application context.

Error starting context: {:error, {:already_started, #PID<0.2000.0>}}

Is bandit reusing connection processes under the hood? I'm kind of confused why this error would happen since :already_started should only be returned if the ApplicationContext.Server had been created previously in the past for that process. It's worth noting that this issue does not happen when using cowboy with the same code.

I'd like to start using bandit but I'm worried about breaking my implementation.

For context I am using:

* bandit 1.2.0 (Hex package) (mix)
  locked at 1.2.0 (bandit) 05688b88
* phoenix 1.7.11 (Hex package) (mix
  locked at 1.7.11 (phoenix) b1ec57f2
* phoenix_live_view 0.20.4 (Hex package) (mix)
  locked at 0.20.4 (phoenix_live_view) d8930c9c

Thanks.

P.S. If you need a minimal reproduction, I can try to whip something up. I mostly just want to ask the question in case anything immediately obvious pops out.

One thing that jumps out initially: the :already_started error you're seeing occurs because the id you provide in your child_spec already exists within the supervisor you're starting it in. So the core of your issue is about the id value you're using here. But the thing is: the id field has nothing to do with the concept of the Registry; that's for process names. So I think right off the top there may be some confusion about what exactly you're trying to accomplish here.

The id field is a property of the child spec (ie: of how supervisors start and supervise a process). It has no scope or meaning outside the particular supervisor it's declared in. It's simply used for the supervisor to identify and de-dupe operations on its directly supervised processes. Specifically, ids have nothing to do with the Registry

On the other hand, names are a property of an actual running process itself. They're used for identifying the process you wish to send messages to, or otherwise interact with. This is why you specify name as an argument to the GenServer; it's because it's a property that's fundamental to the running process itself, and has nothing to do with how it's supervised.

The registry pattern is intended to be used for process names and not ids. So right off the top, passing it as your child_spec's id is not going to work exactly how you expect. What I believe will happen here is that your supervisor will interpret the id as the opaque literal tuple {:via, Registry, ....} without interpreting it as anything special. It's just another tuple as far as it's concerned

So that's one thing that's odd here. ids are used to identify and de-dupe a supervisor's children, and names are used to find processes in a node. Different properties for different purposes. There's some overlap of them in your code

All this being said, the id tuples you're passing should be unique (even if they're not being interpreted as you think they are). So there's definitely some case here where the same process is attempting to start more than one child. A couple of ideas as to how this may be happening:

  • Bandit serves HTTP/1 requests on the Thousand Island process that represents the underlying client connection. Subsequent keepalive on from the same client connection will be handled on the same process. This is different from Cowboy which spins off a one-off handler process for every separate request. You could identify this by logging out self() at the start of each of your Plug.call/2 implementations, and seeing if these process IDs are the ones you're seeing in the id that errors.

  • Cowboy is notorious for silently swallowing log output. Bandit on the other hand is designed quite specifically to swallow no logs at all. This may well have been happening forever on you Cowboy install and you just didn't know.

  • Are you sure that the process starting this is an HTTP process at all? There's a really delicate back and forth at live view startup where mounts may be called from either / both the HTTP process and the live view process (see eg here). If the above ends up being called from a live view process, it may well be re-used from previous connections entirely within the LiveView stack (I'm not familiar enough with the internal process model used there to speak with any authority however).

@mtrudel Thanks for the response.

The registry pattern is intended to be used for process names and not ids. So right off the top, passing it as your child_spec's id is not going to work exactly how you expect. What I believe will happen here is that your supervisor will interpret the id as the opaque literal tuple {:via, Registry, ....} without interpreting it as anything special. It's just another tuple as far as it's concerned

I think this is irrelevant. In this case the Registry I'm using is running in :unique mode and I am passing the same value for name in the GenServer start_link call as I am for the id value in the child_spec. The uniqueness would be guaranteed in both places.

As some additional context, the ApplicationContext.Server is monitoring the process that spawned it, and terminates itself on :DOWN. Also this server is being created in a live view hook. I believe these run both on the initial HTTP request and when the Websocket is starting.

Bandit serves HTTP/1 requests on the Thousand Island process that represents the underlying client connection. Subsequent keepalive on from the same client connection will be handled on the same process. This is different from Cowboy which spins off a one-off handler process for every separate request. You could identify this by logging out self() at the start of each of your Plug.call/2 implementations, and seeing if these process IDs are the ones you're seeing in the id that errors.

After running some tests, I think that this is the issue. As suggested, I started logging self() in the part of the code where the error was being thrown. The behavior was like this:

# during the initial HTTP request
self: #PID<0.1919.0>

# during the socket mount
self: #PID<0.2111.0>

# after clicking refresh in the browser and triggering another HTTP request
self: #PID<0.1919.0>

** Error

It appears as if the initial HTTP connection process is being kept alive even after the app transitions to the live view process, and it is this process that is being reused during redirect or when refreshing the page.

From what you've described, this sounds like the intended behavior and I understand this is probably better for performance (keeping the connection alive and reusing the connection process). It is a bit unintuitive however, because I'm not used to having to "clean up" a process in Elixir.

Would you have a recommendation on how to work around this? Is there a way to hook into the request lifecycle and "clean up" the process before it is reused?

Thanks!

I think this is irrelevant. In this case the Registry I'm using is running in :unique mode and I am passing the same value for name in the GenServer start_link call as I am for the id value in the child_spec. The uniqueness would be guaranteed in both places.

Yes, 100%. it's still unique, but maybe not quite doing what you thought. That's all I was trying to say.

From what you've described, this sounds like the intended behavior and I understand this is probably better for performance (keeping the connection alive and reusing the connection process). It is a bit unintuitive however, because I'm not used to having to "clean up" a process in Elixir.

That's exactly what's happening. The browser is keeping a single HTTP connection to the server open and is reusing it for multiple HTTP calls, in this case even on different reloads of the same page. Bandit is quite explicitly designed to reuse processes here; this is a large part of what makes it so simple and fast. This has always been 'behind the curtain' and not part of the Plug API. There's never been a sanctioned way to interact with or control the underlying process model behind a given Plug connection, by design. The fact that Bandit and Cowboy have differing process models internally ought to be entirely opaque to the user, though this is getting blurrier and blurrier by the day with liveview, long running processes, etc.

I've started sketching out some formalizations of controls & hooks with the Phoenix folks, but it's all a long way away (there's usefulness beyond your use case, for hosting things like grpc connections, liveview processes directly on the connection, etc).

Just so I'm clear, what lifecycle are you looking to attach this process lifecycle to? Are you interested in hooking on the request lifetime, the underlying client connection lifetime, the liveview session lifetime?

That's exactly what's happening. The browser is keeping a single HTTP connection to the server open and is reusing it for multiple HTTP calls, in this case even on different reloads of the same page. Bandit is quite explicitly designed to reuse processes here; this is a large part of what makes it so simple and fast. This has always been 'behind the curtain' and not part of the Plug API. There's never been a sanctioned way to interact with or control the underlying process model behind a given Plug connection, by design. The fact that Bandit and Cowboy have differing process models internally ought to be entirely opaque to the user, though this is getting blurrier and blurrier by the day with liveview, long running processes, etc.

This makes a lot of sense. Thanks for sharing the context.

Just so I'm clear, what lifecycle are you looking to attach this process lifecycle to? Are you interested in hooking on the request lifetime, the underlying client connection lifetime, the liveview session lifetime?

Ideally what I would like is for this process to live for the lifetime of the live view session. As mentioned, I have explicit monitors for the :DOWN event of the live view session that shutdown the gen server when received e.g.

    @impl GenServer
    def handle_info({:DOWN, _ref, :process, _pid, reason}, state) do
      # Shutdown this server when the init process dies.
      {:stop, reason, state}
    end

I think the only challenge here is that live view on_mount hooks run both on the HTTP process and the live view process, and the HTTP process isn't shutdown when I expect it to be e.g. at the end of the request.

I can probably workaround this by forcefully terminating the application context process, on my end when I detect it's already been started, but this isn't ideal because it could potentially mask other issues. A platform solution would be preferred in the future if possible.

Alternatively, perhaps I can use Plug.Conn.register_before_send/2 to terminate the process at the end of the HTTP request. @mtrudel to your knowledge, would this callback still be run even if the underlying Bandit connection process is reused? I think this would be the least intrusive solution.

Thanks!

The liveview process is guaranteed to call your mount function in its process; see the connected? casing described at #264 (comment). Would that help you?

Otherwise, yep, register_before_send runs on the plug lifecycle, so it runs on every request (it's actually part of Plug, not Bandit. It's implemented entirely within Plug, and actually gets called 'within' the Plug lifecycle, before the response is sent out to the client.

The liveview process is guaranteed to call your mount function in its process; see the connected?

I thought about using connected? but unfortunately I need the process during the HTTP call as well because the tenant id is stored in the context and used for data fetching.

Otherwise, yep, register_before_send runs on the plug lifecycle, so it runs on every request (it's actually part of Plug, not Bandit. It's implemented entirely within Plug, and actually gets called 'within' the Plug lifecycle, before the response is sent out to the client.

I will give this a try. Thanks for the help. I'm excited to get bandit working in my application. During my initial testing, apart from this error, everything was super smooth and consistent.

I thought about using connected? but unfortunately I need the process during the HTTP call as well because the tenant id is stored in the context and used for data fetching.

This is more LiveView than Plug, but could you split out those two steps, and pass things from the HTTP mount to the liveview mount via assigns? (ie: grab your properties as needed in your HTTP mount and stash them in assigns, then pull them out and start your process up in the liveview mount?)

Another idea: instead of using self as the unique bit in your registry call, could you use something that liveview maintains on your socket (I think this is what id on your LV socket is?) Then you can stop worrying about process concerns almost entirely.

As an update, register_before_send ended up working pretty smoothly and in an uninstrusive way for me.

This is more LiveView than Plug, but could you split out those two steps, and pass things from the HTTP mount to the liveview mount via assigns? (ie: grab your properties as needed in your HTTP mount and stash them in assigns, then pull them out and start your process up in the liveview mount?)

Another idea: instead of using self as the unique bit in your registry call, could you use something that liveview maintains on your socket (I think this is what id on your LV socket is?) Then you can stop worrying about process concerns almost entirely.

I appreciate the ideas, though in my case I use this process for more than just live views (channels, background jobs etc) so keeping it linked to the process is important (too much detail but i have ecto autogenerate look up values from the context like current user and tenant_id when writing to the database for bulletproof multitenancy).

I will close this issue since I have an easy workaround. Thanks for all the help and context @mtrudel!