dumebi / scrapper

Web scrapper in nodejs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

scrapper

Web scrapper for nairaland post in nodejs

Installing

Download and run

npm install

Code Process

Requirements

NodeJs Express Server
Cheerio - scrape html
bluebird - run promises on asynchronous functions
json2csv - Convert Json to CSV
fs - file system parser

Process

Cheerio grabs HTML and each post is cataloged in a array of texts. array will be passed through functions to grab required fields

Functions

extractEmails - get emails from each text using REGEX
extractPhones - get phone numbers from each text using REGEX
extractAddresses - get location from each text using string manipulation
extractSpecialization - get specialization from each text using string manipulation
extractBusiness - get business from each text using string manipulation

NB: Project still has kinks as much of this data is bad data.

About

Web scrapper in nodejs


Languages

Language:JavaScript 86.0%Language:HTML 14.0%