javilobo8 / news-scrapper

Application for save/send posts from scrapping webs. Useful for news, forums, lists...

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

news-scrapper

build

Application for save/send posts from scrapping webs. Useful for news, forums, lists...

TODO

  • Retreive from Reddit with API? - DONE
  • Add express for consume from some client. - DONE
  • Error handling. - DONE
  • Implement maxRetries client option.
  • Tests.

Targets

Each webpage

const targetWithAxios = {
  name: 'Reddit - SwitchHaxing/New', // Name
  scrapperName: 'reddit-api', // Scrapper name (in this case src/scrappers/reddit-api.scrapper.js)
  description: 'Reddit SwitchHaxing New posts', // Description
  tags: ['switch'], // Some optional tags
  http: {
    client: 'axios', // Could be axios or puppeteer
    clientOptions: {
      maxRetries: undefined, // Defaults to 0 (only axios)
      timeout: undefined, // TBI
      waitPageLoad: undefined, // Defaults to 0 (only puppeteer)
    },
    host: 'https://www.reddit.com', // Host
    path: '/r/SwitchHaxing/new.json', // Path
    method: 'get', // Method
    params: { // Query Params
      search: 'optional',
    }
  },
};

const targetWithPuppeteer =   {
  name: 'PreDB',
  scrapperName: 'predb',
  description: 'Switch releases list page',
  tags: ['switch'],
  http: {
    client: 'puppeteer',
    clientOptions: {
      maxRetries: undefined,
      timeout: undefined,
      waitPageLoad: 5000,
    },
    host: 'https://predb.me',
    path: '',
    method: 'get',
    params: {
      search: 'nsw',
    },
  },
};

Tasks

const taskExample = {
  name: 'TASK #1',
  description: 'Test task number 1',
  active: true, // Enable or disable task
  startOnCreate: true, // Start when application starts (depends on active)
  cron: '* * * * *', // Cron-based syntax
  targets: [], // TargetIds
  emails: ['test@test.com'], // Email to send something TBI
}

About

Application for save/send posts from scrapping webs. Useful for news, forums, lists...


Languages

Language:JavaScript 99.5%Language:Dockerfile 0.5%