Run an expression on all persistent background processes

Question

Run an expression on all persistent background processes

krlmlr opened this issue a year ago · comments

Use case: load a particular package, perhaps with pkgload::load_all(), in all background processes.

From what I understand, it is currently not possible to override cleanup settings for a single mirai? Would that be easily possible, and useful?

Charlie Gao · Answer 1 · Mon Oct 02 2023 17:26:34 GMT+0800 (China Standard Time)

This is a valid use-case.

For parallel clusters, as the default is persistence i.e. no cleanup, something like parallel::clusterExport() already works simply by assigning to the .GlobalEnv.

A solution I had in mind previously was to provide a 'snapshot' tool, to capture a state (Globalenv vars, options, packages) which cleanup then reverts to rather than the initial state. So a mirai could call:

m <- mirai({ .GlobalEnv[["var"]] <- var; mirai::snapshot() }, var = var)

with var then persisting across future evaluations.

The additional work would be to add an interface equivalent to clusterExport() that makes it available on all daemons.

Would something like the above work for you?

Kirill Müller · Answer 2 · Mon Oct 02 2023 20:19:18 GMT+0800 (China Standard Time)

Thanks for your feedback.

I see how mirai tries hard to keep the global state unchanged between invocations. It seems like an uphill battle to me.

Instead of trying to restore state (which might fail for various reasons), how do you feel about capturing state, and reporting state changes to the user? This would then allow the user to take action if needed. The state changes could be reported as part of the output of a mirai. Perhaps targets/crew could pick it up from there and report those as warnings, CC @wlandau?

We could then split up the work: initialization of the workers (which might want to use packages other than the process that launched them, or require custom initialization such as load_all()), and computation, where no state changes are supposed to happen.

I'm late at the game, I wasn't following the development process, I might be suggesting things that have been considered but were rejected.

Charlie Gao · Answer 3 · Mon Oct 02 2023 21:44:46 GMT+0800 (China Standard Time)

I guess you found afterwards that specifying daemons(..., cleanup = 0L) will allow you to amend the global state.

But if I read this correctly, I think you're after a 'setup' argument to daemons(), where you can run a one-time expression when setting up the session, set the initial state after this has happened, and then have cleanup perform after each subsequent evaluation. Would this be sufficient for your requirements?

Reporting state changes would seem to be technically possible, but it would involve wrapping the evaluation result along with the condition, and unwrapping at the other end. At the moment, the evaluation result is returned as is. Seems rather heavy if other methods can achieve the desired outcome.

Kirill Müller · Answer 4 · Mon Oct 02 2023 22:20:08 GMT+0800 (China Standard Time)

From my experience, undoing state changes is very difficult, sometimes impossible, and brittle.

In my opinion, the desired outcome would be to give the user a way to ascertain that their mirais don't have side effects, or only have the side effects they intended. What is your take here?

The wrapping that you mentioned -- are you referring to the actual transport, or to the presentation to the user? For the former, I see how this causes overhead, but perhaps this could be opt-in? Regarding presentation to the user, the mirai could have a $state component if we detected state changes?

A one-time setup would solve most of my problems, I think.

Charlie Gao · Answer 5 · Tue Oct 03 2023 06:21:40 GMT+0800 (China Standard Time)

From my experience, undoing state changes is very difficult, sometimes impossible, and brittle.

Does any particular example come to mind? mirai doesn't try to do anything fancy in its cleanup, but it should be enough to ensure there is no cross-contamination across evaluations.

In my opinion, the desired outcome would be to give the user a way to ascertain that their mirais don't have side effects, or only have the side effects they intended. What is your take here?

My take is that just letting the user know is not particularly helpful, unless it can be handled by the caller in an automated fashion. It would leave users with the choice of either ignoring it, or else resetting their daemons if they are concerned enough. A lot of the time it will not be obvious which is the better option.

The wrapping that you mentioned -- are you referring to the actual transport, or to the presentation to the user?

Yes, I mean there is additional data to transport.

A one-time setup would solve most of my problems, I think.

It would perhaps help to share the use case you have in mind, if it's possible to disclose. The suggestion of state reporting feels a bit alien at the moment, but might make a lot of sense in context.

Will Landau · Answer 6 · Wed Oct 04 2023 23:12:10 GMT+0800 (China Standard Time)

Perhaps targets/crew could pick it up from there and report those as warnings, CC @wlandau?

Yes, both targets and crew store warnings and errors in a stateful way.

Charlie Gao · Answer 7 · Sun Oct 15 2023 17:58:51 GMT+0800 (China Standard Time)

The above comment demonstrates that it is easy to insert your own internal evaluation wrapper inside a mirai, which can then return whatever state you had in mind. mirai itself will return any errors, but otherwise gets out of your way.

Returning back to the original request to have an expression evaluated on all daemons and the result not to be cleaned up, this is now implemented by the function everywhere() in 38f8c32, which evaluates an expression 'everywhere' on all connected daemons.

The arguments map to mirai() so hope this is straightforward enough for you to use. I've deliberately kept the scope wider than for example parallel::clusterExport() so you can evaluate arbitrary code.

A simple example below:

library(mirai)
daemons(8)
Sys.sleep(1)
status()
everywhere(list2env(list(b = 2), envir = .GlobalEnv))
status() # can see a task has been completed on all daemons
m <- mirai(b)
call_mirai(m)$data # 2

Again, works the same way with or without dispatcher.

With both #80 and #81 implemented, let me know if you think anything is still missing.

Kirill Müller · Answer 8 · Tue Oct 17 2023 05:28:20 GMT+0800 (China Standard Time)

Thank you, I'll give it a try!

Regarding state, I think I'm mostly referring to loaded packages. I'm told that DLLs don't always unload cleanly on all systems, this makes unloading packages with native code brittle in the best case. Loading a package does have side effects such as method registration, but are these registrations always cleaned up correctly when unloading? I doubt that.

This is different from the search path (attached packages), I don't care much about those.

Charlie Gao · Answer 9 · Tue Oct 17 2023 17:47:16 GMT+0800 (China Standard Time)

Yes you're right about DLLs - this is well documented e.g. in

?library.dynam.unload

Because it is unreliable, some packages will have an unload hook to unload the DLL, some will not. Some libraries are not designed to be unloaded and reloaded, so if there's a chance it will be re-used in the same session, it is safer not to attempt this.

{mirai} will detach(unload = TRUE) but as to whether this unloads the DLL will depend on the package itself as per the above.

However, stepping back, cleanup is designed foremost to ensure correctness of computations. In this respect the most crucial is global environment cleanup. This prevents inadvertent user error and surprises if something like mirai(a + 1) is attempted forgetting to export a. If a previous evaluation puts a in the global environment, then the evaluation will erroneously succeed. The risk of this is minimised in the first place as 'mirai' evaluations do not occur in the global environment, but cleanup just makes sure of this.

As for loaded/attached packages, the assumption is that on unattach/unload it will no longer affect subsequent computations, even if not all resources are released. I believe this to be reasonable, and if not then this seems to be an issue with the package in question, rather than something that needs to be handled more broadly.

Charlie Gao · Answer 10 · Thu Nov 02 2023 16:48:40 GMT+0800 (China Standard Time)

Thank you, I'll give it a try!

Just a courtesy note @krlmlr that mirai is now frozen for release (perhaps as early as next week). If you have any usability issues with the new interfaces (for your intended purposes) please let me know as there is still time.

Charlie Gao · Answer 11 · Sun Nov 05 2023 19:25:46 GMT+0800 (China Standard Time)

mirai 0.11.1 is already released. Thanks again for the feature suggestion here. Any further comments please feel free to raise another issue.

Kirill Müller · Answer 12 · Mon Nov 06 2023 02:39:38 GMT+0800 (China Standard Time)

Thanks, great! I'll get back to you when I next spend time in this space.