arshadkazmi42 / scraplink

Scraplink library, for scraping links and images url from a webpage

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

scraplink

Build NPM Version NPM Downloads Github Repo Size LICENSE Contributors Commit

Scralink library, for scraping links and assets url from a webpage

Install

npm install scraplink

Usage

const { Scrapper } = require('scraplink');

(async () => {
  const { assets, links } = await Scrapper('http://kaspat.com');
  console.log(assets);
  console.log(links);
})();

// Assets URLS
// 'http://www.theie6countdown.com/images/upgrade.jpg',
// 'http://kaspat.com/images/img1.jpg',
// 'http://kaspat.com/images/img2.jpg',
// 'http://kaspat.com/images/img3.jpg',
// 'http://kaspat.com/images/img4.jpg',
// 'http://kaspat.com/images/page1_img1.jpg',
// 'http://kaspat.com/images/icon1.jpg',
// 'http://kaspat.com/images/icon2.jpg',
// 'http://kaspat.com/images/icon3.jpg',
// 'http://kaspat.com/images/icon4.jpg',
// 'http://www.e-zeeinternet.com/count.php?page=986859&style=odometer&nbdigits=8&reloads=1'

// Links
// 'http://www.microsoft.com/windows/internet-explorer/default.aspx?ocid=ie6_countdown_bannercode',
// 'http://kaspat.com/index.php',
// 'http://kaspat.com/index.php',
// 'http://kaspat.com/News.php',
// 'http://kaspat.com/Services.php',
// 'http://kaspat.com/Kaspat.php',
// 'http://kaspat.com/Clients.php',

API

  • Scrapper

    • Takes url input and scraps assets url and links from the page
  • Parse

    • Parse exposes two functions, as defined below

    • assets

      • Fetches all the assets from the html data
    • links

      • Fetches all the links from the html data
  • ScrapperUtil

    • formatRelativeUrls
      • Formats relative urls to absolute (takes rootUrl and array urls as input)

Contributing

Interested in contributing to this project? You can log any issues or suggestion related to this library here

Read our contributing guide on getting started with contributing to the codebase

About

Scraplink library, for scraping links and images url from a webpage

License:MIT License


Languages

Language:JavaScript 100.0%