program-in-chinese / ChromeCrawlerWildSpider

网页爬虫: Chrome插件,在Chrome浏览器同时加载多个页面并抓取内容.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Chrome extension in webstore: https://chrome.google.com/webstore/detail/wild-spider/aanpchnfojihjddlocpgoekffmjkhbbe

#WATCH OUT: more tabs you use, more computer resources (CPU, memory) will be used, and saving each page costs a bit disk (in IndexedDB, accessible from Chrome Extensions -> Wild Spider, Inspect views: background page)to save the content.

The "spider" works in this way:

    1. The current url is used as the starting point, and it's loaded again in a new tab.
    1. After this page is loaded, fetch all the links on the page.
    1. Get all the links on the page, including relative urls.
    1. Save the text content of the page. Open the extracted link parallelly in all the tabs used (by default 3, set in eventPage).
    1. repeat 2-4

控制部分主要用中文编写: eventPage.js

About

网页爬虫: Chrome插件,在Chrome浏览器同时加载多个页面并抓取内容.


Languages

Language:JavaScript 100.0%