[fix] Unexpected high CPU use with 1s interval job
brenc opened this issue · comments
Describe the bug
Node.js version: v16.17.0
OS version: Docker node:16-bullseye-slim
Description: Just started using Bree to schedule some jobs inside a Docker container. Today I was working on a job that runs every 1s. I noticed that the job runner was using unusually high CPU for a very simple job. If I run my job in a while loop with a 1s sleep, it uses barely any CPU. If I run it using Bree, the job runner will consistently use 20-40% CPU.
Actual behavior
High CPU use.
Expected behavior
Much lower CPU use.
Code to reproduce
Here's the entire job runner:
const Bree = require('bree');
const Graceful = require('@ladjs/graceful');
const logger = require('./logger'); // this is just winston going to console
const bree = new Bree({
jobs: [
{
name: 'aggregate-icecast-status',
interval: '1s',
},
],
logger,
});
const graceful = new Graceful({ brees: [bree] });
graceful.listen();
(async () => {
logger.error('starting...');
await bree.start();
})();
Here's the job:
const { forEach } = require('modern-async');
const dns = require('dns').promises;
const fetch = require('node-fetch');
const Redis = require('ioredis');
const logger = require('../logger').child({
extraInfo: 'aggregate-icecast-status',
});
const redis = new Redis({
host: 'redis',
keyPrefix: 'ngradio:status:',
});
const resolver = new dns.Resolver();
async function main() {
logger.debug('aggregating icecast status');
let addresses;
try {
addresses = await resolver.resolve4('icecast');
} catch (err) {
logger.warn(`error resolving Icecast hosts: ${err.message}`);
return;
}
logger.debug('Icecast addresses: %o', addresses);
let totalListeners = 0;
let listenersPerHost = new Map();
let listenerPeaksPerHost = new Map();
let title;
await forEach(addresses, async (address) => {
logger.debug(`fetching status from ${address}`);
let data;
let response;
try {
response = await fetch(`http://${address}:8000/status-json.xsl`);
data = await response.json();
// logger.debug('%o', data);
} catch (err) {
logger.warn(`erroring collecting status from ${address}: ${err.message}`);
return;
}
const listeners = parseInt(data?.mounts?.['/radio.mp3']?.listeners, 10);
const listenerPeak = parseInt(
data?.mounts?.['/radio.mp3']?.listener_peak,
10
);
if (!title) {
title = data?.mounts?.['/radio.mp3']?.title;
}
if (isNaN(listeners)) {
logger.warn(`error parsing response from ${address}: listeners was NaN`);
} else {
totalListeners += listeners;
listenersPerHost.set(address, listeners);
}
if (isNaN(listenerPeak)) {
logger.warn(
`error parsing response from ${address}: listener peak was NaN`
);
} else {
listenerPeaksPerHost.set(address, listenerPeak);
}
});
await redis.del('listenerPeaksPerHost');
await redis.del('listenersPerHost');
await redis.hset('listenerPeaksPerHost', listenerPeaksPerHost);
await redis.hset('listenersPerHost', listenersPerHost);
await redis.set('totalListeners', totalListeners, 'EX', 5);
await redis.set('title', title, 'EX', 5);
logger.debug(
'Total listeners: %d, Listener peaks: %o, Listeners per host: %o, ' +
'title: "%s"',
totalListeners,
listenerPeaksPerHost,
listenersPerHost,
title
);
await redis.quit();
}
(async () => {
await main();
})();
Checklist
- I have searched through GitHub issues for similar issues.
- I have completely read through the README and documentation.
- I have tested my code with the latest version of Node.js and this package and confirmed it is still not working.
I'm seeing a similar issue running inside k8s, but it doesn't matter on the interval - I get an extremely high cpu usage spike every time it runs. I could just be doing something stupid in the jobs but I figured you'd be able to spot that quickly if so.
Job(s) initialized here: https://github.com/idaholab/Deep-Lynx/blob/master/src/main.ts#L33
Job(s) in question: https://github.com/idaholab/Deep-Lynx/blob/master/src/jobs/data_staging_emitter.ts , https://github.com/idaholab/Deep-Lynx/blob/master/src/jobs/edge_queue_emitter.ts
I've noticed this too even with 5s interval jobs. Bree spikes to > 100% CPU on each run (running the job manually doesn't do this).
Would one of you be willing to give us a basic repo that reproduces this with the Dockerfile in it? We haven't seen this on our servers and it doesn't seem to happen on my local setup even when running in Docker.
Sure, I'll try to whip something up this week.
https://github.com/brenc/bree-issue-201-demo
I set up a simple job that only connects to redis and prints to the console. This causes bree to consistently use 10-20% CPU. Let me know if you need anything else.
Thanks for putting this together. Since the job is running so frequently, the cpu usage will consistently be relatively high since every second you will incur the resource usage of building a new worker. One thing to keep in mind, specifically with Docker, is that the cpu's are limited by docker itself, so depending on your settings that could be 18-38% of one core.
I would suggest for tasks like these to define a long running job and use something like p-queue to run the task on an interval.
There might be a better way for us to handle this in Bree using a worker pool, but I will have to think on it some more.
That seems reasonable. A loop with a sleep would probably suffice as well. Thanks!
I'll keep this open in case you want to do anything more with it.
Yes, this is a code structure problem. You should be using something like p-queue
and definitely not spawning a new worker thread every 1s
. You should spawn a long running worker that then runs setInterval
or something else in serial once high intense CPU operations are complete. See https://github.com/sindresorhus/promise-fun for more packages like p-queue
.