News Scraping
This article discusses everything you need to know about news scraping, including the benefits and use cases of news scraping as well as how you can use Python to create an article scraper.
For a detailed explanation, see our blog post.
Fetch HTML Page
pip3 install requests
Create a new Python file and enter the following code:
import requests
response = requests.get(https://quotes.toscrape.com')
print(response.text) # Prints the entire HTML of the webpage.
Parsing HTML
pip3 install lxml beautifulsoup4
from bs4 import BeautifulSoup
response = requests.get('https://quotes.toscrape.com')
soup = BeautifulSoup(response.text, 'lxml')
title = soup.find('title')
Extracting Text
print(title.get_text()) # Prints page title.
Fine Tuning
soup.find('small',itemprop="author")
soup.find('small',class_="author")
Extracting Headlines
headlines = soup.find_all(itemprop="text")
for headline in headlines:
print(headline.get_text())
If you wish to find out more about News Scraping, see our blog post.