nicholaishaw / web-scraping-challenge

Michigan State University Data Analytics HTML Scraping Challenge

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data Analysis With Web Scraping

Background

In this challenge, I am taking on a full web-scraping and data analysis project for a fictional company. Throughout my data analysis education, I have learned to identify HTML elements on a page, identify their id and class attributes, and use this knowledge to extract information via both automated browsing with Splinter and HTML parsing with Beautiful Soup. I have also learned to scrape various types of information, including HTML tables and recurring elements—like multiple news articles on a webpage.

In this challenge, I am working on two deliverables:

  • Deliverable 1: Scrape titles and preview text from Mars news articles.
  • Deliverable 2: Scrape and analyze Mars weather data, which exists in a table.

Deliverable 1: Scrape Titles and Preview Text from Mars News

In the Jupyter folder, the file named deliverable_1_mars_news.ipynb will be used for this section. To scrape the Mars News website, I followed the steps below:

  1. I used automated browsing to visit the Mars news site. Inspect the page to identify which elements to scrape using chrome developer tools.
  2. I created a Beautiful Soup object and use it to extract text elements from the website.

image

Figure 1. Using the Beautiful Soup object to extract all of the text elements on the Mars webpage.


  1. I extracted the titles and preview text of the website.

image

Figure 2. Extracting all of the titles and previews from the website.


  1. I stored the titles and preview scraped above in Python data structures as follows:
    • Stored each title-and-preview pair in a Python dictionary and, give each dictionary two keys: title and preview.
    • Stored all the dictionaries in a Python list.
    • Printed the list in your notebook.
    • An example of this output is as follows:

image

Figure 3. Code to store all of the titles and previews in a Python dictionary.

Deliverable 2: Scrape and Analyze Mars Weather Data

In the Jupyter folder, the file named deliverable_2_mars_weather.ipynb will be used for this deliverable. To scrape and analyze Mars weather data, I will complete the following steps:

  1. I used automated browsing to visit the Mars Temperature Data Site. I inspected the page to identify which elements to scrape.
  2. I created a Beautiful Soup object and use it to scrape the data in the HTML table. This can also be achieved by using the Pandas 'read_html' function. However, I used Beautiful Soup here to showcase my web scraping skills.

image

Figure 4. The Beautiful Soup object used to scrape the HTML information from the Mars Weather Data website.


  1. I assembled the scraped data into a Pandas DataFrame. I gave the columns the same headings as the table on the website. Below is an explanation of the column headings:
    • id: the identification number of a single transmission from the Curiosity rover
    • terrestrial_date: the date on Earth
    • sol: the number of elapsed sols (Martian days) since Curiosity landed on Mars
    • ls: the solar longitude
    • month: the Martian month
    • min_temp: the minimum temperature, in Celsius, of a single Martian day (sol)
    • pressure: The atmospheric pressure at Curiosity's location

image

Figure 5. Storing all of the scraped data in a Pandas dataframe.


  1. I examined the data types that are currently associated with each column. I converted the data to the appropriate datetime, int, or float data types.
  2. Analyze your dataset by using Pandas functions to answer the following questions:
    • How many months exist on Mars?
    • How many Martian (and not Earth) days worth of data exist in the scraped dataset?
    • What are the coldest and the warmest months on Mars (at the location of Curiosity)? To answer this question:
      • Find the average minimum daily temperature for all of the months.
      • Plot the results as a bar chart.
    • Which months have the lowest and the highest atmospheric pressure on Mars? To answer this question:
      • Find the average daily atmospheric pressure of all the months.
      • Plot the results as a bar chart.
    • About how many terrestrial (Earth) days exist in a Martian year? To answer this question:
      • Consider how many days elapse on Earth in the time that Mars circles the Sun once.
      • Visually estimate the result by plotting the daily minimum temperature.

image

Figure 6. Sample analyses and graphs from the scraped data.


  1. I exported the DataFrame to a CSV file.

About

Michigan State University Data Analytics HTML Scraping Challenge


Languages

Language:Jupyter Notebook 100.0%