ActsAsTenant's usage of thread-local variables does not play well with the WebSocket thread pool
shepmaster opened this issue · comments
Bug Report
Describe the bug
The documentation suggests a usage of ActsAsTenant
:
module ApplicationCable
class Connection < ActionCable::Connection::Base
identified_by :current_user
def connect
self.current_user = env["warden"].user
ActsAsTenant.current_tenant = current_user.account
end
end
end
However, ActsAsTenant uses thread-local variables as provided by the request_store gem.
Setting thread-local variables in #connect
is a bad idea because the thread that services a WebSocket request may not be the same thread that ran #connect
!
Example debugging output
In this case, we were not yet following the SR documentation's suggestion and so we were not setting anything in #connect
, only relying on setting ActsAsTenant.current_tenant
inside our Rails controller helpers. Here, both WebSocket connections used the same thread (69500) and thus the data "leaked" from one user to another:
(thread id, action, transport, user, note)
62860 #new -- HTTP -- user 1 -- load page
69500 #connect -- WS -- user 1 -- connects
69500 #new -- WS -- user 1 -- performs reflex action, setting thread-local value
62860 #new -- HTTP -- user 2 -- load page
69500 #connect -- WS -- user 2 -- connects
69500 #new -- WS -- user 2 -- performs reflex action, using *wrong* thread-local value
After discovering this, we checked the docs and updated our code to set the current tenant in #connect
. Further testing exposed that this does not fix all cases. Here, user 1 performs an action after user 2 connects and sets the thread-local value. Since both users requests are serviced by the same thread (73060), the data again leaks across users:
(thread id, action, transport, user, note)
62860 #new -- HTTP -- user 1 -- load page
73060 #connect -- WS -- user 1 -- connects, sets thread-local value
73060 #new -- WS -- user 1 -- performs reflex action, using correct thread-local value
62860 #new -- HTTP -- user 2 -- load page
73060 #connect -- WS -- user 2 -- connects, sets thread-local value
73060 #new -- WS -- user 1 -- performs reflex action, using *wrong* thread-local value
To Reproduce
This is not easy to reproduce, by any means. What I did was to set config.action_cable.worker_pool_size = 2
in application.rb to increase how easy it would be for a thread to be reused. A value of 1 may even work?
I then printed out Thread.current.object_id
inside of #connect
and a Rails action (#new
in the above logs).
I opened two concurrent browser windows to my application, each one as a different user. I then interleaved requests in various ways, watching the debugging output.
Expected behavior
A few things I'd like:
- Document a working solution for
ActsAsTenant
. - Document how reflex work is scheduled on threads. I was surprised that multiple WebSocket users shared the same thread (although in retrospect it is the better design choice).
Attempted fixes
In a standard Rails application, request_store registers a middleware that resets the thread-local variables on each request. This does not automatically apply to the Stimulus Reflex case. Attempting to add this middleware to Stimulus Reflex doesn't appear to work, perhaps because it tries to wait until the end of the stream.
Creating my own middleware is non-performant due to #564.
My current solution is to add an around_reflex
to StimulusReflex::Reflex
itself (as it is what handles the default reflex, AFAICT):
StimulusReflex::Reflex.around_reflex :stimulus_reflex_reset_request_store
def stimulus_reflex_reset_request_store
RequestStore.begin!
yield
ensure
RequestStore.end!
RequestStore.clear!
end
Versions
StimulusReflex
- Gem: 3.4.1
- Node package: 3.4.1
External tools
- Ruby: 3.0.2p107 (2021-07-07 revision 0db68f0233) [arm64-darwin21]
- Rails: 6.1.4.1
- Node: v16.13.0
Browser
- Chrome 98.0.4706.0 (Official Build) canary (arm64)
I've also created https://github.com/shepmaster/stimulus_reflex_harness
Hey Jake, look what Mr. @palkan cooked up for us: https://anycable.io/blog/multi-tenancy-vs-cables/
Truly, I didn't fully understand the problem scenario before. I'm not a concurrency/threading expert, and I am still learning more about how Action Cable works behind the scenes every day.
Anyhow, the post details solutions that should work for Action Cable and AnyCable today, and Rails 7.1 will have a generalized solution that doesn't require reopening any framework classes.
Sorry that it took so long to get a meaningful resolution on this! Definitely going into the docs.