futurist / fetch-site

Fetch website with all the resources/responses/requests to local files, using puppeteer.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

fetch-site

Fetch website with all the resources/responses/requests to local files, using puppeteer.

npm Build Status

Install

CLI

npm install -g fetch-site

Module

npm install --save fetch-site

Usage

CLI Usage

$ fetch-site url [options]

Options
--version           Show version info
--help              Show help info
--no-headless, -h   Set {headless: false} for 'launch-option'
--dir, -d           Dir to save result to
--shot, -s          Filename to save a screenshot after page open
--user-agent, -u    Set userAgent, string
--viewport, -v      Set viewport, e.g. `'{width:1024, height: 768}'`
--timeout, -t       Set maximum navigation time in milliseconds
--cookies, -c       Set cookies
--wait-for, -w      Wait for milliseconds/selector/function/closed(true)
--index-file        Default name of index file, like index.html
--on-response       onResponse event, function(response) as string
--launch-option, -l Launch option passed into puppeteer, object as string
--open-option, -o   Open option to passed into page, object as string
--on-before-open    Before open page event, function(page) as string
--on-after-open     After open page event, function(page) as string
--on-finish         Finish fetch event, function(page) as string

Examples
$ fetch-site http://baidu.com -h -w -o '{waitUntil:"networkidle0"}' -u 'My-UA-String' -t 0

Above example will open the url with {headless:false}, wait until networkidle0, and wait for page close to exit, using UAString "My-UA-String".

Module Usage

const fetchSite = require('fetch-site')
fetchSite({
  url: 'http://www.baidu.com',
  // whether to save a screenshot
  shot: 'shot.png',
  dir: 'baidu.com',
  launchOption:{
    headless: false
  },
  openOption:{
    timeout: 100*1e3,
    waitUntil: 'networkidle0'
  },
  onResponse: async response=>{
    if(/his/.test(response.url)) return false
    // return false will skip the resource
  },
  onBeforeOpen: async page=>{
    page.setViewport({width: 1440, height: 1000})
  },
  onAfterOpen: async page=>{
    await new Promise(r=>setTimeout(r, 5000))
  },
  onFinish: async page=>console.log('ok')
})

Then check the folder baidu.com for results.

About

Fetch website with all the resources/responses/requests to local files, using puppeteer.


Languages

Language:HTML 95.8%Language:JavaScript 4.2%Language:CSS 0.1%