piscinajs / piscina

A fast, efficient Node.js Worker Thread Pool implementation

Home Page:https://piscinajs.github.io/piscina/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

unbalanced assigning jobs to workers when concurrentTasksPerWorker > 1

0xgeert opened this issue · comments

commented

I'm seeing a weird issue where jobs are assigned unbalanced (in terms of load) to workers when concurrentTasksPerWorker > 1

I prepared a snippet to illustrate/reproduce.
Since all jobs take ~100ms, I expect each worker to be assigned ~ jobs. This clearly does not happen, see log below.

Thoughts?

Master.js

const Promise = require("bluebird")
const Piscina = require('piscina');
const path = require("path")

start()
async function start(){

  const blocks = []
  for (let i = 0; i <= 1000 * 1000; i++) {
    blocks.push(i);
  }

  const pool = new Piscina({
    maxThreads: 10,
    concurrentTasksPerWorker: 5, 
    filename: path.resolve(__dirname, 'worker.js'),
  })

  let blockCounter = 0
  for(let i = 0; i < blocks.length; i++){
    pool.run(blocks[i]).then((dto) => {
      blockCounter++
      console.log(blockCounter, JSON.stringify(dto))
    })
  }

  // Wait done
  while(true){
    if(blockCounter === blocks.length) break
    await Promise.delay(100)
  }
}

worker.js

const Promise = require("bluebird")
const { v4: uuidv4 } = require('uuid');

const workerId = uuidv4()

let counter = 0
module.exports = async (block) => {

  await Promise.delay(100)
  counter++

  return {
    block, 
    workerId,
    counter, // check reuse
  }
}

as can be easily seen from the code, there's ~3000 jobs done, distributed over 10 threads.
concurrentTasksPerWorker = 5.

I would expect each thread to receive ~1/10 of jobs or ~300 jobs. As tracked by counter some threads (identified by workerId) have processed way more (~580), while others are in the (~114 range)

580/114 -> 5.08, which is close to concurrentTasksPerWorker = 5. This seems to hold for all concurrentTasksPerWorker settings I've checked.

(partial) Logs

2984 {"block":2983,"workerId":"04385347-ec52-41d9-bc54-08786988a108","counter":575}
2985 {"block":2984,"workerId":"04385347-ec52-41d9-bc54-08786988a108","counter":576}
2986 {"block":2985,"workerId":"04385347-ec52-41d9-bc54-08786988a108","counter":577}
2987 {"block":2986,"workerId":"04385347-ec52-41d9-bc54-08786988a108","counter":578}
2988 {"block":2987,"workerId":"04385347-ec52-41d9-bc54-08786988a108","counter":579}
2989 {"block":2988,"workerId":"73453c94-cb29-4468-a640-f0f9752f51b2","counter":572}
2990 {"block":2989,"workerId":"73453c94-cb29-4468-a640-f0f9752f51b2","counter":573}
2991 {"block":2990,"workerId":"73453c94-cb29-4468-a640-f0f9752f51b2","counter":574}
2992 {"block":2991,"workerId":"73453c94-cb29-4468-a640-f0f9752f51b2","counter":575}
2993 {"block":2992,"workerId":"88fb8877-4d49-4ff2-a130-6adfb67b9e23","counter":114}
2994 {"block":2993,"workerId":"778fde13-7637-41df-8fef-754e24132add","counter":114}
2995 {"block":2995,"workerId":"e2469002-03d6-4ce5-82fc-31b3a8660d37","counter":114}
2996 {"block":2994,"workerId":"23137b7d-e9e8-487f-acb3-80ffb1f546dd","counter":114}
2997 {"block":2996,"workerId":"8131b0c3-16e3-40cf-9717-41237f0e2bd5","counter":114}
2998 {"block":2997,"workerId":"425c7002-5d49-4a21-af6a-57d766ffefc3","counter":581}
2999 {"block":2998,"workerId":"425c7002-5d49-4a21-af6a-57d766ffefc3","counter":582}
3000 {"block":2999,"workerId":"425c7002-5d49-4a21-af6a-57d766ffefc3","counter":583}
3001 {"block":3000,"workerId":"425c7002-5d49-4a21-af6a-57d766ffefc3","counter":584}
3002 {"block":3001,"workerId":"425c7002-5d49-4a21-af6a-57d766ffefc3","counter":585}
3003 {"block":3002,"workerId":"dd65c4b5-e598-4fb9-84cb-69064a48184e","counter":581}
3004 {"block":3003,"workerId":"dd65c4b5-e598-4fb9-84cb-69064a48184e","counter":582}
3005 {"block":3004,"workerId":"dd65c4b5-e598-4fb9-84cb-69064a48184e","counter":583}
3006 {"block":3005,"workerId":"dd65c4b5-e598-4fb9-84cb-69064a48184e","counter":584}
3007 {"block":3006,"workerId":"dd65c4b5-e598-4fb9-84cb-69064a48184e","counter":585}

Hi @0xgeert, nice finding!

But this is somehow expected by Piscina, as Piscina uses a FIFO strategy to distribute the load across workers, so it is totally normal that depending on the worker load, incoming tasks, and time spent on each task, some workers are more used than others.

If you're interested in using a custom strategy (e.g. Round Robin), you can provide a custom TaskQueue that fulfill your needs 🙂

commented

KK. Thanks for the explain.
FWIW: Problem totally disappears now that I've moved away from a synthetic test to jobs that don't have the exact same delay.