Hardware remains powered on app restart

Question

Hardware remains powered on app restart

gordonm867 opened this issue 5 years ago · comments

For safety reasons, if an OpMode refuses to begin to stop after a few seconds of the "stop" button being pressed, fails to initialize in a timely manner, or fails to stop in a timely manner, the OpModeManagerImpl.OpModeStuckCodeMonitor.Runner.run() method sends a kill request that shuts down the RC app, which is restarted a few seconds later.

However, the watching class fails to force the REV hub to stop powering motors. In fact, since the active OpMode is shutdown by the app crash, motors are stuck in their last state until the app relaunches. This uncontrolled period can result in damage to the robot, people/items/other robots surrounding it, the field, etc., and is itself a safety issue.

Ensuring that all hardware is powered off is something that is taken care of in other places in the OpModeManagerImpl class. Specifically, this code seems to shut down all powered motors:

Iterator var1 = this.hardwareMap.getAll(DcMotorSimple.class).iterator();
            while(var1.hasNext()) {
                DcMotorSimple motor = (DcMotorSimple)var1.next();
                if (motor.getPower() != 0.0D) {
                    motor.setPower(0.0D);
                }
            }

The fact that the RC app restarts itself successfully after one of these shutdowns is proof that the OpModeManagerImpl.OpModeStuckCodeMonitor.Runner.run() method is run. The app killing is part of this method. However, in the same way that the code gives a message on the phones to notify the user as to what is happening, it would seem that shutting down hardware prior to the app's shutdown is plausible and would greatly improve the safety of the SDK.

Windwoes · Answer 1 · Fri Jun 07 2019 02:12:26 GMT+0800 (China Standard Time)

So I've actually been looking into this myself. The issue is that if the user code is holding the main USB RX/TX reentrant lock for the Lynx module, then the SDK cannot squeeze in to send a failsafe command. Furthermore, even assuming the SDK was able to grab the lock, it would need to continue to hold it until the app restarts to prevent the user code from sending other rogue commands after the SDK sent the failsafe command. However, that main reentrant lock is set to force unlock itself after a timeout to prevent possible deadlock situations; thus a slightly different locking approach would be needed.

I've been experimenting with a 2-layer locking system with a "master-master" lock which the SDK can grab to forcibly hang user code during the restart.

Windwoes · Answer 2 · Fri Jun 07 2019 02:16:42 GMT+0800 (China Standard Time)

Also note that this is not the only problem in the SDK pertaining to rogue user code - see this other issue I had posted about

Windwoes · Answer 3 · Fri Jun 07 2019 02:20:28 GMT+0800 (China Standard Time)

One more thing I will add though, is that unlike MR, the Lynx module has a 2500ms timeout before entering failsafe mode. Thus, the hardware is in fact stopped at some point during the app restart, just not quite as soon as would be desirable :)

Robert Atkinson · Answer 4 · Fri Jun 07 2019 02:31:24 GMT+0800 (China Standard Time)

MR also has a similar timeout.

Windwoes · Answer 5 · Fri Oct 04 2019 22:16:52 GMT+0800 (China Standard Time)

Update: A fix for this has been merged into the internal repo and will be included in the v5.3 release.

Windwoes · Answer 6 · Sun Nov 03 2019 06:43:06 GMT+0800 (China Standard Time)

v5.3 is out, this can be closed now