bymaximus / simplecrawler-mongo-queue

MongoDB queue implementation for simplecrawler

Repository from Github https://github.combymaximus/simplecrawler-mongo-queueRepository from Github https://github.combymaximus/simplecrawler-mongo-queue

MongoDB queue for Simplecrawler

npm Travis (.org) Dependency Status devDependency Status Greenkeeper badge

This is a queue implementation for simplecrawler powered by MongoDB.

Installation

npm install --save simplecrawler-mongo-queue

Usage

First of all, create a new simplecrawler instance as described in the documentation. Then create the queue instance and assign it to crawler.queue property.

const Crawler = require('simplecrawler');
const MongoQueue = require('simplecrawler-mongo-queue');

const crawler = new Crawler('http://example.com');
crawler.queue = new MongoQueue(datastore, name);

The MongoQueue constructor has two arguments.

  • datastore - the application should provide a MongoDB collection where the queue will be stored.
  • name (optional) - a name of the queue to distinguish the different crawlers. If the argument is omitted the constructor creates a random queue name.

Example

Below is a minimal usage example with connection to MongoDB.

const Crawler = require('simplecrawler');
const MongoQueue = require('simplecrawler-mongo-queue');
const MongoClient = require('mongodb').MongoClient;

const client = new MongoClient('mongodb://localhost:27017', { useNewUrlParser: true });
client.connect(err => {
  const db = client.db('crawler');
  const collection = db.collection('queue');

  const crawler = new Crawler('http://example.com');
  crawler.queue = new MongoQueue(collection, 'mycrawler');

  crawler.on('complete', () => {
    client.close();
    process.exit();
  });

  crawler.start();
});

Performance

The charts reflect the database performance during a test run.

About

MongoDB queue implementation for simplecrawler


Languages

Language:JavaScript 100.0%