This is a queue implementation for simplecrawler powered by MongoDB.
npm install --save simplecrawler-mongo-queue
First of all, create a new simplecrawler instance as described in the documentation. Then create the queue instance and assign it to crawler.queue
property.
const Crawler = require('simplecrawler');
const MongoQueue = require('simplecrawler-mongo-queue');
const crawler = new Crawler('http://example.com');
crawler.queue = new MongoQueue(datastore, name);
The MongoQueue
constructor has two arguments.
datastore
- the application should provide a MongoDB collection where the queue will be stored.name
(optional) - a name of the queue to distinguish the different crawlers. If the argument is omitted the constructor creates a random queue name.
Below is a minimal usage example with connection to MongoDB.
const Crawler = require('simplecrawler');
const MongoQueue = require('simplecrawler-mongo-queue');
const MongoClient = require('mongodb').MongoClient;
const client = new MongoClient('mongodb://localhost:27017', { useNewUrlParser: true });
client.connect(err => {
const db = client.db('crawler');
const collection = db.collection('queue');
const crawler = new Crawler('http://example.com');
crawler.queue = new MongoQueue(collection, 'mycrawler');
crawler.on('complete', () => {
client.close();
process.exit();
});
crawler.start();
});
The charts reflect the database performance during a test run.