jruby / jruby

JRuby, an implementation of Ruby on the JVM

Home Page:https://www.jruby.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Process.wait and Process.wait2 not behaving as expected on Windows with JDK > 8

jcouball opened this issue · comments

On Windows, Process.wait is not setting $? (or $CHILD_STATUS) and Process.wait2 is returning nil.

This is happening within my GitHub Actions. I created a simple test here that shows the behavior:
https://github.com/jcouball/process_spawn_test/actions/runs/3632082827

This workflow is a matrix build that runs the tests on two different JRuby and MRI versions on both Ubuntu and Windows. Only the JRuby builds on Windows fail.

The test code is very simple and can be found here:
https://github.com/jcouball/process_spawn_test/blob/main/spec/test_spec.rb

Duplicated here:

require 'English'

describe 'Process#wait' do
  it 'should set the global $CHILD_STATUS variable' do
    pid = Process.spawn('exit 0')
    Process.wait(pid)
    expect($CHILD_STATUS).not_to be_nil
    expect($CHILD_STATUS.pid).to eq(pid)
  end
end

describe 'Process#wait2' do
  it 'should return a non-nil status' do
    pid = Process.spawn('exit 0')
    exited_pid, status = Process.wait2(pid)
    expect(status).not_to be_nil
    expect(status.pid).to eq(pid)
  end
end

I suspect this might be a configuration error with the Windows image I am using. I don't have access to a Windows computer or virtual environment so can't debug it on my own. Also, my Windows experience is rusty.

Environment Information

The environment where I am seeing unexpected behavior:

  • JRuby 9.3.9.0 and JRuby 9.4.0.0
  • Windows Server 2022 (20221127 update) (see this page for more details of what is included in this Windows image)
  • The following environment variables are included:
    • JAVA_OPTS: -Djdk.io.File.enableADS=true
    • JRUBY_OPTS: --debug -Xnative.verbose=true
  • The Gemfile only includes the rspec gem

Environments where I see the expected behavior:

  • JRuby 9.3.9.0 and JRuby 9.4.0.0 on Ubuntu
  • MRI Rubys on both Ubuntu and Windows

Expected Behavior

  • Process.wait should set $? / $CHILD_STATUS
  • Process.wait2 should return [pid, status] instead of nil.

Actual Behavior

  • After calling Process.wait, $? / $CHILD_STATUS is nil.
  • Process.wait2 is returning nil instead of [pid, status].

I think I just ran into this as well.
Perhaps the pid values are wrong? This snippet:

out = IO.popen("pause")
puts out.pid

gives crazy large numbers with jdk 17

c:\ java --add-opens java.base/sun.nio.ch=ALL-UNNAMED --add-opens java.base/java.io=ALL-UNNAMED -jar jruby-complete-9.4.0.0.jar test2.rb
1801196366

with jdk 8 it's the right numbers like 15952, does yours work with jdk 8?

Looks like it is getting this and then defaulting to hash code (unrelated suggestion: just raise instead of default to hashcode?)

java.lang.IllegalAccessException: class org.jruby.util.ShellLauncher$3 (in module org.jruby.dist) cannot access a member of class java.lang.ProcessImpl (in module java.base) with modifiers "private final"
        at java.base/jdk.internal.reflect.Reflection.newIllegalAccessException(Reflection.java:392)
        at java.base/java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:674)
        at java.base/java.lang.reflect.Field.checkAccess(Field.java:1102)
        at java.base/java.lang.reflect.Field.get(Field.java:423)
        at org.jruby.dist/org.jruby.util.ShellLauncher$3.getPid(ShellLauncher.java:745)
        at org.jruby.dist/org.jruby.util.ShellLauncher.reflectPidFromProcess(ShellLauncher.java:775)
        at org.jruby.dist/org.jruby.util.ShellLauncher.getPidFromProcess(ShellLauncher.java:656)
        at org.jruby.dist/org.jruby.util.ShellLauncher.runWithoutWait(ShellLauncher.java:644)
        at org.jruby.dist/org.jruby.util.ShellLauncher.runExternalWithoutWait(ShellLauncher.java:585)
        at org.jruby.dist/org.jruby.RubyProcess.spawn(RubyProcess.java:1779)
        at org.jruby.dist/org.jruby.RubyProcess$INVOKER$s$0$0$spawn.call(RubyProcess$INVOKER$s$0$0$spawn.gen)
        at org.jruby.dist/org.jruby.internal.runtime.methods.JavaMethod$JavaMethodN.call(JavaMethod.java:824)
        at org.jruby.dist/org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:220)
        at org.jruby.dist/org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:372)
        at org.jruby.dist/org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:175)
        at bad_them.invokeOther0:spawn(bad_them.rb:1)
        at bad_them.RUBY$script(bad_them.rb:1)
        at bad_them.run(bad_them.rb)
        at java.base/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:732)
        at org.jruby.dist/org.jruby.ir.Compiler$1.load(Compiler.java:114)
        at org.jruby.dist/org.jruby.Ruby.runScript(Ruby.java:1277)
        at org.jruby.dist/org.jruby.Ruby.runNormally(Ruby.java:1194)
        at org.jruby.dist/org.jruby.Ruby.runNormally(Ruby.java:1176)
        at org.jruby.dist/org.jruby.Ruby.runNormally(Ruby.java:1212)
        at org.jruby.dist/org.jruby.Ruby.runFromMain(Ruby.java:991)
        at org.jruby.dist/org.jruby.Main.doRunFromMain(Main.java:398)
        at org.jruby.dist/org.jruby.Main.internalRun(Main.java:282)
        at org.jruby.dist/org.jruby.Main.run(Main.java:227)
        at org.jruby.dist/org.jruby.Main.main(Main.java:199)

This seems to workaround it but not sure if you want a more global fix, unit tests etc:

diff --git a/core/src/main/java/org/jruby/util/ShellLauncher.java b/core/src/main/java/org/jruby/util/ShellLauncher.java
index 17b861c..91e76f4 100644
--- a/core/src/main/java/org/jruby/util/ShellLauncher.java
+++ b/core/src/main/java/org/jruby/util/ShellLauncher.java
@@ -42,6 +42,8 @@ import java.io.PipedInputStream;
 import java.io.PipedOutputStream;
 import java.io.PrintStream;
 import java.lang.reflect.Field;
+import java.lang.reflect.InvocationTargetException;
+import java.lang.reflect.Method;
 import java.nio.ByteBuffer;
 import java.nio.channels.FileChannel;
 import java.util.ArrayList;
@@ -745,7 +747,14 @@ public class ShellLauncher {
                         }

                     } catch (Exception e) {
-                        // ignore and use hashcode
+                        // JDK > 8 has a new way to look it up and also can't use "handle" field anymore
+                        try {
+                            Method pid = process.getClass().getSuperclass().getMethod("pid");
+                            Java.trySetAccessible(pid);
+                            return (long) pid.invoke(process);
+                        } catch (Exception e2) {
+                            // ignore and use hashcode
+                        }
                     }
                     return process.hashCode();
                 }

somewhat related: flapdoodle-oss/de.flapdoodle.embed.mongo#120

This may be an issue running the complete jar with module support on more recent JDKs. There are various flags that need to be passed to open up the internals of the classes in question. Some earlier JDKs will just complain, but later ones will outright deny access.

I've added the "windows" label and will be getting a test environment set up soon to work on this and other issues.

I don't think it's an issue with the complete jar (it still fails with the --add-opens). I think it is a windows thing on > JDK 8 yeah.

Assigning myself to keep track of this for subspawn-win32

@byteit101 and @headius: I was wondering what the plan is for this issue?

The plan, as far as I'm aware, is for the next release (or the one after that) to ship subspawn-win32, however that is contingent on me getting access to a windows development environment where master is actually green so that I can validate that my changes actually work and don't break stuff.

That sounds great. Forgive my ignorance, what is subspawn-win32? Is that a Win32 library? I search here and googled but didn't find anything obvious.

It's a backend for my SubSpawn project: https://github.com/byteit101/subspawn/

Each component is a gem, you can find subspawn-posix and subspawn on rubygems

JRuby currently uses SubSpawn for PTY.spawn and other PTY functions on Linux & macOS as of 9.4

Hey Patrick, wondering if progress is being made on this issue? JRuby support for my gem is blocked on this issue -- which isn't an emergency for me but important to some users of my gem. I noted that you have made quite a few changes on the subspawn project so thought you might have an update.

Yeah! If you are lucky subspawn-win32 now works. You are generally lucky about 2/3 of the time. When you are unlucky, a read of a pipe from a dead process hangs indefinitely due to some native IO issue I haven't been able to figure out. @headius offered to help me debug it, but he's been at conferences recently. If you don't need to read STDOUT/STDERR, though you can build master and use it without any problems, at least that I've come across. I've been unable to run it in the JRuby test suite as that uses blocking STDOUT reads and thus hangs.

If you would like to try to help test, or solve the IO issue, either would be much appreciated. I was going to link you to the GHA builds, but I realized that I left them broken, and I'm about to head off on vacation for a week. Building master locally should be fine though, and you can also pop by #jruby on matrix as I'm usually there. I will be back and able to help again on the 21st.

I will mention that `` (backtick, or %x{}) is affected by this, but spawn + wait should be fine

This behavior sounds familiar since that was exactly what I faced in my Ruby MRI implementation. However, I confess that I do not have much experience in Java (it's been about 20 years!). Unfortunately, I don't think I can be of much help.

I appreciate all your efforts and will continue to wait.

Seems working ok for me with that patch :) If you trust random builds: https://drive.google.com/file/d/1F523EobGRxITEWue3JF5BbhqF_XReWO1/view?usp=share_link

When do you think this might be merged?

Do we have a PR for this patch yet?

Sorry this fell off the priority list but let's wake it up and get it fixed for 9.4.5.0.

That's a great question. This is waiting on debugging the windows hangs in subspawn-win32. I've been very busy this year, though things should be calming down for me in a few weeks in november.

@headius and @byteit101

❄️ Hello and Happy Holidays ❄️

Hoping to get an update as to when you think this might land in JRuby. I see this is targeted to 9.4.6.0. Is that still the plan?

I know you are busy folks so I am just trying to get your current thoughts.

Currently nobody is working on the Windows support in subspawn, so this is waiting for some resource to step up.

If nobody else does it, I will try to do it myself, but it might be a little while. We would like to have this in next release.

I have a Windows VM and will try to look at this for 9.4.9!

This is actually simpler than it seems and I missed the details.

On Windows we do not dig out the PID properly, and instead return a hashcode. This is obviously not waitable, and for whatever reason we hang trying to wait for bogus PIDs on Windows.

Subspawn fixes this by actually implementing process management with FFI on Windows, returning a real PID. It is largely complete for common platforms other than Windows, but Windows needs some help due to byteit101/subspawn#3. Once that's resolved and the rest of Windows support is cleaned up and completed, we can switch to Subspawn on all platforms and resolve this issue.

We probably will not make that move until JRuby 10, but we can update the optionally-enabled Subspawn in any JRuby 9.4 release. I think we should also patch the pid logic from ShellLauncher as described above.

I have pushed a version of the patch from @rdp in #8310 but I am unable to test it due to us not having full FFI support on Win32 AArch64 yet.