ttavni / PyWebScraper

A set of functions and classes to help web scraping and simple web audits

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Web Scraping for Python

A set of functions and classes to help web scraping and simple web audits

from pyscraper.sitemapper import Sitemapper
from pyscraper.scrapper import BatchScrape

sitemap = 'https://www.datascience.com/sitemap.xml'

page_urls = Sitemapper(sitemap)
completed_urls, broken_urls = BatchScrape(page_urls)

In addition you can now visualise the hierachical nature of the sitemap and produce a d3.js visualisation

# Visualise pages
from pyscraper.viz import VisualiseSitemap
VisualiseSitemap(page_urls)

Visualisation

The text from each page could then be visualised using this repository

About

A set of functions and classes to help web scraping and simple web audits


Languages

Language:Python 59.5%Language:HTML 40.5%