Piscina for single use workers
theinterned opened this issue · comments
Hello, I am trying to use Piscina to manage a pool of single use SSR render workers. My use case is that my render workers have two tasks: an initialization that is very slow, and a render task that is very fast. What I want is a pool of pre-initialized workers that I can use to render, then throw out, with the disposal triggering the pool to initialize more workers ready to work.
// worker.js
let renderContext;
// I want to keep this fast
function render(renderArgs) {
const result = renderToString(renderContext, renderArgs);
renderContext = null; // dispose of the context so it doesn't get re-used
setTimeout(() => process.exit(0), 0); // or some other way to shut down this process?
return result;
}
// I want this to run on server start up and to re-run as workers are disposed of and the pool is re-filled
async function initialize() {
renderContext = await doExpensiveInitialization(); // create a new render context here
return render;
}
export default initialize()
// render.js
let pool;
function start() {
const pool = new Piscina('./worker.js');
}
function render(renderArgs) {
const result = await pool.run(renderArgs); // I want the worker that handled this request to be shut down immediately after returning
return result;
}
I have not been able to figure out how to manually dispose of a worker so that piscina doesn't re-use it and instead spins up a new one. I tried process.exit(0)
, but this doesn't seem to be well supported. Is there another way?
I have also tried to define a separate reinitialization
that I call immediately after render
, but there appear to be a few issues with this:
// worker.js
let renderContext;
function render(renderArgs) {
const result = renderToString(renderContext, renderArgs);
renderContext = null;
return result;
}
async function createRenderContext() {
renderContext = await doExpensiveInitialization();
}
const tasks = {render, reinitialize: createRenderContext}
function work({task, args }) {
return tasks[task](args);
}
async function initialize() {
await createRenderContext();
return work;
}
export default initialize()
// render.js
let pool;
function start() {
const pool = new Piscina('./worker.js');
}
function render(renderArgs) {
const result = await pool.run({task: 'render', renderArgs});
pool.run({task: 'reinitialize'});
return result;
}
- I am not sure there's any guarantee that
reinitialize
will be called on the same worker asrender
. Is there any way to guarantee this? Or is there another way to think about this? - I think I'm seeing
render
tasks being called on workers before theyreinitialize
. Is there any way to explicitly mark a worker as busy?
And I guess, taking a step back, does this use case make sense for Piscina? Maybe there is a different way to achieve what I'm trying to achieve with Piscina? Maybe Piscina is not the right tool for this job?
Hey! Just have a few questions to gather more context:
- Does the initialization is idempotent?
- What's the size of the initialization outcome?
It might possibly have more sense to do the tasks in series rather than initialization and task over and over. I'm just guessing as I don't have the reason behind your strategy.
Due to the situation described in the issue you shared, is not easy to get it right with the current implementation.
- yes the initialization is idempotent
- initialization is creating a VM that's running a bundled react app. So it can be quite large: at least several MBs.
As to 1. I could probably more accurately write it as
// I want this to run on server start up and to re-run as workers are disposed of and the pool is re-filled
async function initialize() {
renderContext ||= await doExpensiveInitialization(); // create a new render context here
return render;
}
export default initialize()
so that renderContext
is only re-assigned if it doesn't already exist
That can be an option. Having single-use workers is not the most performant way of managing heavy single-task workloads. It is better to treat them as a single task unit, which will allow you to reduce the overhead of thread initialization, and get the benefits of optimizations from the engine itself.
You can customize the minimum number of threads available (using minThreads
), which can get you some gains before receiving traffic or start processing your workloads
okay thank you. In our case we do need to do some work between renders and this pattern seems to be working out pretty well.
Specifically we're doing something like the second version I posted in my original comment that calls reinitialize
asynchronously immediately after every render
.
function render(renderArgs) {
const result = await pool.run({task: 'render', renderArgs});
pool.run({task: 'reinitialize'});
return result;
}
This is working out pretty well and we're finding that we are able to split rendering and re-initialization pretty well. It also seems like there is some sort of guarantee that the reinitialize
will run on the same worker as the render
task did. Although it would be nice to have docs specifying exactly how worker pool order is governed.
If you refer to order of usage of each worker, it uses a FIFO strategy but you can provide your own custom ordering, see: https://github.com/piscinajs/piscina#custom-task-queues
@theinterned feel free to open it once again if you think this remains valid