Support Ruby 3.3's `Process.warmup` before fork
bensheldon opened this issue · comments
Is your feature request related to a problem? Please describe.
Ruby 3.3 introduces Process.warmup
which does the following:
- Performs a major GC
- Compacts the heap.
- Promotes all surviving objects to the old generation.
- Precomputes the coderange of all strings.
- Frees all empty heap pages and increments the allocatable pages counter by the number of pages freed.
- Invoke malloc_trim if available to free empty malloc pages.
The functionality is somewhat similar to the removed nakayoshi fork functionality, but (hopefully) performs better.
I'm imagining this would be exposed/implemented in a similar manner to the nakayoshi_fork
DSL directive.
The functionality is somewhat similar to the #2925, but (hopefully) performs better.
Is it possible to find out? Can we construct benchmarks?
One of the reasons we removed nakayoshi fork was it's tendency to surface extremely difficult to debug problems in C-extensions due to compaction. I'm not sure that will have really improved since the time that the feature was considered.
was it's tendency to surface extremely difficult to debug problems in C-extensions due to compaction.
That is indeed a problem. I (and others) fixed many compaction issues in popular extensions, but there is a long tail of potential problems in less popular ones.
I'm always happy to help debug and fix these, but as you mention, often the problem is to identify the extension responsible for it. In most case it's not too hard, but sometimes it's super tricky.
I guess it's not much practical difference to do this:
before_fork { ::Process.warmup }
...versus having in the Puma DSL something like this:
ruby_process_warmup
Probably the vast majority of the value is having it officially documented and visible in Puma as something one could or should do. Though I totally understand if it creates an untenable support burden by surfacing other issues (though maybe something that could be addressed through documentation, though I also recognize that doesn't redirect everyone who might open an issue).
I have tested this in our pretty complex Rails application with Puma running in clustered mode and it led to segfaults. There's risk in making it a default. At the very least my suggestion is to keep it configurable unlike Sidekiq.
No problem with
on_worker_fork do
Process.warmup
end
I have tested this in our pretty complex Rails application with Puma running in clustered mode and it led to segfaults
I know it's not always easy, but it would be very valuable to figure out which gems are impacted and report the issue upstream. If you have trouble identifying which gem, I'd be happy to look at your crash reports to try to help.
No problem with
on_worker_fork
Doing this would entirely defeat the benefits of Process.warmup
, it would even be counter-productive.
I had some time to dig into this, I was able to reproduce with our codebase locally, but could not narrow down it entirely.
It crashes right on the Process.warmup
. The C stack trace looks like this:
-- C level backtrace information -------------------------------------------
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(rb_vm_bugreport+0xb4c) [0x101404c28]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(rb_bug_for_fatal_signal+0x100) [0x101247ebc]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(sig_do_nothing+0x0) [0x10136c840]
/usr/lib/system/libsystem_platform.dylib(_sigtramp+0x38) [0x18d67f584]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(coderange_scan+0x34) [0x101393c5c]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(rb_enc_str_coderange+0x60) [0x10137b904]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(gc_set_candidate_object_i+0x68) [0x1012699ac]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(objspace_each_objects_try+0x2f0) [0x101278ee4]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(rb_ensure+0xcc) [0x1012544cc]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(rb_gc_prepare_heap+0xa8) [0x101269458]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(proc_warmup+0x34) [0x101318b60]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(vm_call_cfunc_with_frame_+0xf0) [0x1013f791c]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(vm_exec_core+0x2048) [0x1013dd674]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(rb_vm_exec+0x3d8) [0x1013da6a4]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(invoke_block_from_c_bh+0x368) [0x1013ff09c]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(rb_yield+0xa8) [0x1013e881c]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(rb_ary_each+0x40) [0x1011af43c]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(vm_call_cfunc_with_frame_+0xf0) [0x1013f791c]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(vm_exec_core+0x1df8) [0x1013dd424]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(rb_vm_exec+0x3d8) [0x1013da6a4]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(load_iseq_eval+0x1fc) [0x1012b5a18]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(require_internal+0x374) [0x1012b3914]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(rb_require_string+0x68) [0x1012b2c74]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(vm_call_cfunc_with_frame_+0xf0) [0x1013f791c]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(vm_call_alias+0x70) [0x1013f3660]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(vm_exec_core+0x2048) [0x1013dd674]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(rb_vm_exec+0x1ec) [0x1013da4b8]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(rb_ec_exec_node+0xa0) [0x101253644]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(ruby_run_node+0x60) [0x10125353c]
/Users/tomas/.rbenv/versions/3.3.0/bin/ruby(main+0x68) [0x100747f24]
In our before_fork
we have some warm up logic that triggers certain Rails controller actions:
before_fork do
StartupWarmup.warmup_puma
ApplicationRecord.connection.disconnect!
Process.warmup
end
Skipping StartupWarmup.warmup_puma
or moving Process.warmup
before it actually solves the issue. It invokes several Rails controller actions to warm up caches. Unfortunately I cannot share the actual code, I will share findings if I dig out something more.
I think for simple Rails apps this is not an issue, and if your before_fork
is not doing anything complex then the segfault is probably low risk.
coderange_scan
: this means it's crashing while trying to read the content of a String. Which means some extension created a corrupted RString (pointing to an invalid memory region).
That's 100% a bug in a C extension. The hard part is to track which one. A start is to grep with something like:
$ rg -F rb_str_ %(bundle show --paths)
There is a couple C API that allow creating such broken string, the most common is rb_str_new_static
.
If you share the result of that search in a gist, I'd be happy to see if I can figure out which one it is.
Thanks to your tip I managed to narrow it down and find the gem with the issue. It's memcached. This gem is abandoned and we use our own fork with extra fixes. We're not on dalli because we have an mcrouter cluster of memcached instances, and unfortunately it only works with the text protocol of Memcached.
So the good news is that most likely no one else is using this gem and shouldn't encounter this issue. 😄
Ah, that rings a bell. 99% sure I fixed that bug not so long ago, let me look.
Alright, I'm bad at writing commit messages, so I can't be fully certain, but I think this was the fix: Shopify/memcached@28b5840
But overall you can use this branch, we use it with Process.warmup
and even reforking without issue. So if the issue persists it's either not in memcached
or it's in a area of memcached
you use and we don't.: https://github.com/Shopify/memcached/commits/1-0-stable-shopify/
@byroot I can confirm that with your fork there is no issue. 👍