stimulusreflex / stimulus_reflex

Build reactive applications with the Rails tooling you already know and love.

Home Page:https://docs.stimulusreflex.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ActsAsTenant's usage of thread-local variables does not play well with the WebSocket thread pool

shepmaster opened this issue · comments

Bug Report

Describe the bug

The documentation suggests a usage of ActsAsTenant:

module ApplicationCable
  class Connection < ActionCable::Connection::Base
    identified_by :current_user

    def connect
      self.current_user = env["warden"].user
      ActsAsTenant.current_tenant = current_user.account
    end
  end
end

However, ActsAsTenant uses thread-local variables as provided by the request_store gem.

Setting thread-local variables in #connect is a bad idea because the thread that services a WebSocket request may not be the same thread that ran #connect!

Example debugging output

In this case, we were not yet following the SR documentation's suggestion and so we were not setting anything in #connect, only relying on setting ActsAsTenant.current_tenant inside our Rails controller helpers. Here, both WebSocket connections used the same thread (69500) and thus the data "leaked" from one user to another:

(thread id, action, transport, user, note)
62860 #new     -- HTTP -- user 1 -- load page 
69500 #connect -- WS   -- user 1 -- connects
69500 #new     -- WS   -- user 1 -- performs reflex action, setting thread-local value
62860 #new     -- HTTP -- user 2 -- load page
69500 #connect -- WS   -- user 2 -- connects
69500 #new     -- WS   -- user 2 -- performs reflex action, using *wrong* thread-local value

After discovering this, we checked the docs and updated our code to set the current tenant in #connect. Further testing exposed that this does not fix all cases. Here, user 1 performs an action after user 2 connects and sets the thread-local value. Since both users requests are serviced by the same thread (73060), the data again leaks across users:

(thread id, action, transport, user, note)
62860 #new     -- HTTP -- user 1 -- load page
73060 #connect -- WS   -- user 1 -- connects, sets thread-local value
73060 #new     -- WS   -- user 1 -- performs reflex action, using correct thread-local value
62860 #new     -- HTTP -- user 2 -- load page
73060 #connect -- WS   -- user 2 -- connects, sets thread-local value
73060 #new     -- WS   -- user 1 -- performs reflex action, using *wrong* thread-local value

To Reproduce

This is not easy to reproduce, by any means. What I did was to set config.action_cable.worker_pool_size = 2 in application.rb to increase how easy it would be for a thread to be reused. A value of 1 may even work?

I then printed out Thread.current.object_id inside of #connect and a Rails action (#new in the above logs).

I opened two concurrent browser windows to my application, each one as a different user. I then interleaved requests in various ways, watching the debugging output.

Expected behavior

A few things I'd like:

  1. Document a working solution for ActsAsTenant.
  2. Document how reflex work is scheduled on threads. I was surprised that multiple WebSocket users shared the same thread (although in retrospect it is the better design choice).

Attempted fixes

In a standard Rails application, request_store registers a middleware that resets the thread-local variables on each request. This does not automatically apply to the Stimulus Reflex case. Attempting to add this middleware to Stimulus Reflex doesn't appear to work, perhaps because it tries to wait until the end of the stream.

Creating my own middleware is non-performant due to #564.

My current solution is to add an around_reflex to StimulusReflex::Reflex itself (as it is what handles the default reflex, AFAICT):

StimulusReflex::Reflex.around_reflex :stimulus_reflex_reset_request_store

def stimulus_reflex_reset_request_store
  RequestStore.begin!
  yield
ensure
  RequestStore.end!
  RequestStore.clear!
end

Versions

StimulusReflex

  • Gem: 3.4.1
  • Node package: 3.4.1

External tools

  • Ruby: 3.0.2p107 (2021-07-07 revision 0db68f0233) [arm64-darwin21]
  • Rails: 6.1.4.1
  • Node: v16.13.0

Browser

  • Chrome 98.0.4706.0 (Official Build) canary (arm64)

Hey Jake, look what Mr. @palkan cooked up for us: https://anycable.io/blog/multi-tenancy-vs-cables/

Truly, I didn't fully understand the problem scenario before. I'm not a concurrency/threading expert, and I am still learning more about how Action Cable works behind the scenes every day.

Anyhow, the post details solutions that should work for Action Cable and AnyCable today, and Rails 7.1 will have a generalized solution that doesn't require reopening any framework classes.

Sorry that it took so long to get a meaningful resolution on this! Definitely going into the docs.