hellyworld/udemy_web_scraping

Web Scraping (also termed Screen Scraping, Web Data Extraction, Web Harvesting, etc.) is a technique for extracting large amounts of data from websites and save the the extracted data to a local file or to a database.

In this course, you will learn how to perform web scraping using Python 3 and the Beautiful Soup, a free open-source library written in Python for parsing HTML.

We will use lxml, which is an extensive library for parsing XML and HTML documents very quickly; it can even handle messed up tags. We will also be using the Requests module instead of the already built-in urllib2 module due to improvements in speed and readability.

Finally, we will use Selenium alongside Beautiful Soup to crawl AJAX & JavaScript driven pages.

The course cover the following topics: accessing web pages programmatically; scraping web pages to extract the required data using Beautiful Soup to parse web pages; interacting with web pages to do different things with them programmatically; and using Selenium for web scraping and when we need it.

By the end of this course, you will be able to understand how websites and servers function, diverse data extraction techniques, and methods of handling and organizing data.

This Web Scraping course covers the following topics:

Review of data structures (Lists, Dictionaries, Tuples, File Handling)
How websites are hosted on servers
Calls to the server (GET, POST methods)
Review of HTML and CSS
Requests Module and BeautifulSoup Module overview
Parsing HTML using BeautifulSoup
Filtering elements using BeautifulSoup and navigating the Parse Tree
JavaScript and AJAX overview
Selenium and the need for it
Selecting elements using Selenium 
CSS selectors 
XPath selectors 
Navigating pages using Selenium 
Practical Projects

What you’ll learn

Python Refresher: Review of Data Structures, Conditionals, File Handling
How Websites are Hosted on Servers; Basic Calls to Server (GET, POST Methods)
Web Scraping with Python Beautiful Soup and Requests
Using Selenium to handle JavaScript and AJAX
Diverse Web Scraping Exercises
Source codes (*.py files) for all Exercises can be downloaded
Q&A board to send your questions and get them answered quickly

Are there any course requirements or prerequisites?

Some prior programming experience in Python (e.g. Data Structures and OOP) will help. The course includes a full Python refresher section.
Complete beginners may wish to take a beginner Python course first, and then transition to this course afterwards.
This course adopts a step-by-step approach and requires you to open a Python editor, download available *.py code files, and start applying the provided examples and exercises.
Python 3: Codes of this course are tested on Python 3. It is up to you to adapt them if you want to run them in Python 2.

Who this course is for:

Those whoe want to learn how to use Python for web scraping and data extraction.

hellyworld / udemy_web_scraping

About

Languages