ntflix / PearsonKBDownloader

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pearson Knowledge Base Downloader

Downloads the entire Pearson Qualifications, Resource & Digital Support Knowledge Base.

This was made out of frustration after traditional web scraping proved to be ineffective due to the JavaScript heavy nature of the site.

This will totally slow down your computer as it spins up hundreds of instances of Google Chrome to download the site. Would give it a couple mins to run on an average computer.

Usage

Ensure you edit URLS_FILE, OUTPUT_DIR, and CHROME_PATH in download.py. This has already a list of all the URLs to download, but you can add more if you want. This is the purpose of URLS_FILE. I created this list using Screaming Frog SEO Spider.

About


Languages

Language:Python 100.0%