Understanding the current state of mirai clusters / use with {plumber}
gangstR opened this issue · comments
@shikokuchuo , thank you so much for the wonderful contribution to the R community. Bravo!
I'm not sure this is really an issue. I pose a question at the bottom and provide a short reprex with code & commands at the end.
Please forgive the verbosity, but let me set up the context first. I noticed you'd recently added support for make_cluster()
, and so I'd hoped that this could be a drop-in replacement for my persistent parallelly
PSOCK clusters I've been using for a very involved high-performance R endpoint solution I've written. Due to very specific requirements and heavy initialization, the cluster must be persistent (i.e., future::plan(cluster, workers = cl, persistent = TRUE)
. To be clear, calling that line with a mirai cluster has no issues. I'll also mention that I've used daemons()
with the promises::future_promise()
wrappers rather extensively without any issue at all (and with remarkable speed improvements). Unfortunately, when I swap out the clusters in my solution, I get various errors and warnings with the most relevant being: Warning: resolved() is not yet implemented for workers of class ‘miraiNode’. Will use value() instead and return TRUE
.
I realize this is not coming from your package; however, it seems you and Henrik have been working together very closely. My question is: Am I trying to use the mirai cluster feature inappropriately? (i.e., inappropriate with respect to your design and plans for the feature). If so, then my apologies and absolutely no worries. You've done us all a tremendously good service already. mirai
provides very low latency in all of the places I've used it. Thank you
reprex
plumber.R - swap comments on lines 2 and 3 to toggle PSOCK and mirai cluster
core.cnt <- parallelly::availableCores(logical = FALSE)
# cl <- mirai::make_cluster(n = core.cnt)
cl <- parallelly::makeClusterPSOCK(workers = core.cnt, autoStop = TRUE)
future::plan(cluster, workers = cl, persistent = TRUE)
#* @post /lollygag
lollygag <- function(req, res) {
result <- promises::future_promise({
if (jsonlite::fromJSON(req$postBody)$speed == "fast") {
Sys.sleep(0.1)
} else {
Sys.sleep(10)
}
return(list(status = 200L, body = list(thread_id = Sys.getpid())))
}, seed = TRUE, stdout = FALSE, globals = c("req", "res")) %...>% (function(result) {
res$status <- result$status
res$body <- result$body
})
}
launch it
router <- plumber::plumb("plumber.R")
plumber::pr_run(router, port = 8080)
test with curl
or tool of choice
seq 1 8 | xargs -I $ -n1 -P4 curl -w"\n" -X POST "http://127.0.0.1:8080/lollygag" -H "accept: application/json" -d '{"speed": "slow"}'
Thanks for posting. I should add some examples to the documentation at some point.
Your example works with the below setup chunk, with no changes to the lollygag
function.
cl <- mirai::make_cluster(n = 4L)
parallel::setDefaultCluster(cl)
future::plan("cluster")
I set a hardcoded 4 node cluster for illustration so it's easy to see it working.
mirai
is designed to provide a backend for parallel clusters (extension of base R). Support for 'promises' is also built in, and a 'mirai' can be piped directly using the promises pipe to return a promise (there is no need to future_promise()
where this is not necessary).
Anything in addition (here 'cluster' type futures) is incidental and may not be 100% supported.
Thank you for the quick reply! I was able to test my more involved local example with this today and can confirm I'm working. Yay! Thank you. So, the real culprit was that I cannot supply cl
as an argument to the workers
parameter in future::plan()
(i.e., no difference with or without having called parallel::setDefaultCluster(cl)
. My hurried reprex didn't show that I'd tried that line, but that wasn't the issue. Consider me closed! Thank you once more for the awesome package!
@gangstR I decided to take another look in response to your message:
So, the real culprit was that I cannot supply
cl
as an argument to theworkers
parameter infuture::plan()
(i.e., no difference with or without having calledparallel::setDefaultCluster(cl)
.
That didn't seem right, and of course it wasn't.
Although the miraiCluster
had been set up, the future promises were not in fact using it. I delved into the future
code and it seems that plan("cluster")
without specifying 'workers' defaults to availableCores()
i.e. a new cluster setup rather than the default cluster. Setting the default cluster is correct usage for the parallel
package and I just assumed that it carried over for future
, but I was mistaken.
In light of this, I have to say that at present, integration with the future
framework is only experimental through future.mirai
and not through the use of parallel
clusters.
However, it is possible to run the equivalent using only mirai
, as I will show you below, without the need for additional abstraction layers.
Additional benefits are that you can directly specify daemons options for dispatcher, cleanup etc.
Revised code for Plumber.R:
library(promises) # make available promises pipe
mirai::daemons(4L)
#* @post /lollygag
lollygag <- function(req, res) {
mirai::mirai(
{
speed <- req[["HEADERS"]][["speed"]]
sleep <- Sys.sleep(if (speed == "fast") 0.1 else 10)
list(status = 200L, body = list(speed = speed, thread_id = Sys.getpid()))
},
.args = list(req, res)
) %...>% (function(x) {
res$status <- x$status
res$body <- x$body
})
}
I am encountering an error when trying to pass in POST data (accessing req$postBody
). I haven't spent the time to figure this out yet, but I find I can easily work around by passing in a request header instead.
To query the API, I can use nanonext
from another interactive R session:
library(nanonext)
ncurl("http://127.0.0.1:8080/lollygag",
method = "POST",
headers = c(accept = "application/json", speed = "fast"))
#> $status
#> [1] 200
#>
#> $headers
#> NULL
#>
#> $data
#> [1] "{\"speed\":[\"fast\"],\"thread_id\":[14374]}"
I can even test it by performing 8 asynchronous "fast" followed by "slow" requests:
library(mirai.promises) # for as.promise method for nanonext aios
library(promises) # make available promises pipe
for (i in 1:8) {
ncurl_aio("http://127.0.0.1:8080/lollygag",
method = "POST",
headers = c(accept = "application/json", speed = "fast")) %...>% print()
}
for (i in 1:8) {
ncurl_aio("http://127.0.0.1:8080/lollygag",
method = "POST",
headers = c(accept = "application/json", speed = "slow")) %...>% print()
}
#> [1] "{\"speed\":[\"fast\"],\"thread_id\":[14374]}"
#> [1] "{\"speed\":[\"fast\"],\"thread_id\":[14374]}"
#> [1] "{\"speed\":[\"fast\"],\"thread_id\":[14376]}"
#> [1] "{\"speed\":[\"fast\"],\"thread_id\":[14379]}"
#> [1] "{\"speed\":[\"fast\"],\"thread_id\":[14383]}"
#> [1] "{\"speed\":[\"fast\"],\"thread_id\":[14374]}"
#> [1] "{\"speed\":[\"fast\"],\"thread_id\":[14379]}"
#> [1] "{\"speed\":[\"fast\"],\"thread_id\":[14376]}"
#> [1] "{\"speed\":[\"slow\"],\"thread_id\":[14374]}"
#> [1] "{\"speed\":[\"slow\"],\"thread_id\":[14383]}"
#> [1] "{\"speed\":[\"slow\"],\"thread_id\":[14379]}"
#> [1] "{\"speed\":[\"slow\"],\"thread_id\":[14376]}"
#> [1] "{\"speed\":[\"slow\"],\"thread_id\":[14374]}"
#> [1] "{\"speed\":[\"slow\"],\"thread_id\":[14383]}"
#> [1] "{\"speed\":[\"slow\"],\"thread_id\":[14376]}"
#> [1] "{\"speed\":[\"slow\"],\"thread_id\":[14379]}"
Do make sure to wait for the "slow" messages to print to the console!
Hope the above is clear enough for you. Let me know if you have any further questions.
@shikokuchuo , thank you for correcting me on my conclusion. I was too hasty. I confirmed your findings with futureSessionInfo()
. That was unfortunate, but I agree that there are some possible ways for me to work around this by taking advantage of other aspects of your package. The good news is that I already have a production-grade non-mirai solution that works today, but I'm exploring mirai and other options to further scale that capability. The example you provided illustrates one potential path. Thank you for your assistance.
I've added examples for both GET and POST endpoints to a new vignette.
https://shikokuchuo.net/mirai/articles/plumber.html
You may refer to those.
Am closing this issue, but if you or other {plumber} experts come up with better solutions, documentation contributions are most welcome!