sideround / lab-advanced-web-scraping

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ironhack logo

Lab | Advanced Web Scraping

Introduction

This lab is a guided tutorial for you to write an advanced web scraping class. The code of a functional baseline class will be provided to you and we'll show you how to instantiate the class and start web scraping tasks. Then in each of the challenges, you will implement one feature of the class such as html parsing, error handling, rate controlling, faking request headers, making asynchronous requests etc. At the end of this lab, you will have an advanced web scraping class which you can use and continue building in your API and Web Scraping Project.

Getting Started

Open the main.ipynb file in the your-code directory. Follow the instructions and complete each challenge.

Deliverables

main.ipynb with your responses to each of the exercises.

Submission

Upon completion, add your deliverables to git. Then commit git and push your branch to the remote.

Resources

5 strategies to write unblock-able web scrapers in Python

About


Languages

Language:Jupyter Notebook 100.0%