piscinajs / piscina

A fast, efficient Node.js Worker Thread Pool implementation

Home Page:https://piscinajs.github.io/piscina/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Piscina for single use workers

theinterned opened this issue · comments

Hello, I am trying to use Piscina to manage a pool of single use SSR render workers. My use case is that my render workers have two tasks: an initialization that is very slow, and a render task that is very fast. What I want is a pool of pre-initialized workers that I can use to render, then throw out, with the disposal triggering the pool to initialize more workers ready to work.

// worker.js
let renderContext;

// I want to keep this fast
function render(renderArgs) {
  const result =  renderToString(renderContext, renderArgs);
  renderContext = null; // dispose of the context so it doesn't get re-used
  setTimeout(() => process.exit(0), 0); // or some other way to shut down this process?
  return result;
}

// I want this to run on server start up and to re-run as workers are disposed of and the pool is re-filled
async function initialize() {
  renderContext = await doExpensiveInitialization(); // create a new render context here
  return render;
}

export default initialize()
// render.js
let pool;

function start() {
  const pool = new Piscina('./worker.js');
}

function render(renderArgs) {
  const result = await pool.run(renderArgs); // I want the worker that handled this request to be shut down immediately after returning 
  return result;
}

I have not been able to figure out how to manually dispose of a worker so that piscina doesn't re-use it and instead spins up a new one. I tried process.exit(0), but this doesn't seem to be well supported. Is there another way?

I have also tried to define a separate reinitialization that I call immediately after render, but there appear to be a few issues with this:

// worker.js
let renderContext;

function render(renderArgs) {
  const result =  renderToString(renderContext, renderArgs);
  renderContext = null;
  return result;
}

async function createRenderContext() {
  renderContext = await doExpensiveInitialization();
}

const tasks = {render, reinitialize: createRenderContext}

function work({task, args }) {
  return tasks[task](args);
}

async function initialize() {
  await createRenderContext();
  return work;
}

export default initialize()
// render.js
let pool;

function start() {
  const pool = new Piscina('./worker.js');
}

function render(renderArgs) {
  const result = await pool.run({task: 'render', renderArgs});
  pool.run({task: 'reinitialize'});
  return result;
}
  1. I am not sure there's any guarantee that reinitialize will be called on the same worker as render. Is there any way to guarantee this? Or is there another way to think about this?
  2. I think I'm seeing render tasks being called on workers before they reinitialize. Is there any way to explicitly mark a worker as busy?

And I guess, taking a step back, does this use case make sense for Piscina? Maybe there is a different way to achieve what I'm trying to achieve with Piscina? Maybe Piscina is not the right tool for this job?

Hey! Just have a few questions to gather more context:

  1. Does the initialization is idempotent?
  2. What's the size of the initialization outcome?

It might possibly have more sense to do the tasks in series rather than initialization and task over and over. I'm just guessing as I don't have the reason behind your strategy.

Due to the situation described in the issue you shared, is not easy to get it right with the current implementation.

  1. yes the initialization is idempotent
  2. initialization is creating a VM that's running a bundled react app. So it can be quite large: at least several MBs.

As to 1. I could probably more accurately write it as

// I want this to run on server start up and to re-run as workers are disposed of and the pool is re-filled
async function initialize() {
  renderContext ||= await doExpensiveInitialization(); // create a new render context here
  return render;
}

export default initialize()

so that renderContext is only re-assigned if it doesn't already exist

That can be an option. Having single-use workers is not the most performant way of managing heavy single-task workloads. It is better to treat them as a single task unit, which will allow you to reduce the overhead of thread initialization, and get the benefits of optimizations from the engine itself.

You can customize the minimum number of threads available (using minThreads), which can get you some gains before receiving traffic or start processing your workloads

okay thank you. In our case we do need to do some work between renders and this pattern seems to be working out pretty well.

Specifically we're doing something like the second version I posted in my original comment that calls reinitialize asynchronously immediately after every render.

function render(renderArgs) {
  const result = await pool.run({task: 'render', renderArgs});
  pool.run({task: 'reinitialize'});
  return result;
}

This is working out pretty well and we're finding that we are able to split rendering and re-initialization pretty well. It also seems like there is some sort of guarantee that the reinitialize will run on the same worker as the render task did. Although it would be nice to have docs specifying exactly how worker pool order is governed.

If you refer to order of usage of each worker, it uses a FIFO strategy but you can provide your own custom ordering, see: https://github.com/piscinajs/piscina#custom-task-queues

@theinterned feel free to open it once again if you think this remains valid