queue method should be a promise
CristianMR opened this issue · comments
When skipDuplicates
is set to true
, the 'drain' event is emitted before it's checked whether queue(uri)
has been seen. The solution is quite straightforward: provide a promise to determine whether queue(uri)
has been resolved. This would allow us to await c.queue(uri)
inside the callback function before calling done()
.
Crawler.prototype.queue = function queue(options) {
var self = this;
// Did you get a single object or string? Make it compatible.
options = _.isArray(options) ? options : [options];
options = _.flattenDeep(options);
const promises = options.map((option) => {
if (self.isIllegal(option)) {
log('warn', 'Illegal queue option: ', JSON.stringify(option));
return;
}
return self._pushToQueue(
_.isString(option) ? { uri: option } : option
);
});
return Promise.all(promises);
};
Crawler.prototype._pushToQueue = function _pushToQueue(options) {
// ...
// just return the promise
return self.seen.exists(options, options.seenreq).then(rst => {
if (!rst) {
self._schedule(options);
}
}).catch(e => log('error', e));
};
Look, I totally understand, but it'll break the current API using, also if one does not await queue
, 'unhandled promise' warning will always be there. What's worse, it to confuse the API when providing promise and callback at the same time. To be hoest, it is better to deduplicate outside the crawler, which means should be handled by the developer. That keeps the flexibility and consistency. Hope it helpful.
Thanks for your answer Mike. I already did it. It took some hours to find out this issue so probably others will do it too. Have a nice year btw ✨
Sorry to hear that you spent hours on this issue, so let's keep the issue details here to help others. Thanks, and the same to you, have a nice year.