dionajie / pyScrapPage

Scraping Singe Page and file contents

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pyScrapPage

Scraping HTML Website Template (include css, js, images) and restructurize folders like original one. PyScrapPage is one of apps on challenge list i made. You can read it on [my blog].

Installation

Download this repository git clone https://github.com/dionajie/pyScrapPage.git

install package using pip

pip install urllib2
pip install validators
pip install BeautifulSoup
pip install url

How to

open scraping.py and set variables

url = 'path/to/url'
filenamePage = 'your page filename'
pathfolder = 'path/to/your/folder'

Example:

url = 'http://blackrockdigital.github.io/startbootstrap-creative/' 
filenamePage = 'index'	
pathfolder = 'startbootstrap/'

Then go to your folder path and run this command in terminal

python -u scraping.py

TO DO

  • download image in inline style

Warning

Use this app wisely. It is your responsibility to use this app

[my blog] : https://blog.dionajie.com/python-apps-challenge-1bc71acbdc5f#.tgqxlbw31

About

Scraping Singe Page and file contents


Languages

Language:JavaScript 59.9%Language:HTML 21.0%Language:CSS 17.5%Language:Python 1.5%