JLeigh101 / HTML-CSS-Scraping

NU Bootcamp Module 11

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HTML-CSS-Scraping

NU Bootcamp Module 11

File Contents

  1. part_1_mars_news.ipynb - this file contains the code used to generate mars.json by scraping the html/css tags on the mars news website and turning the scraped data into a list of python dictionaries
  2. part_2_mars_waether.ipynb - this file contains the code used to generate mars_data.csv (by scraping html/css on the mars weather site), as well as the supplemental data analysis on mars weather.
  3. mars.json - output file from part 1
  4. mars_data.csv - output file from part 2
  5. chromedriver.exe - tool used to manipulate Google Chrome through VS-Code

Background You’re now ready to take on a full web-scraping and data analysis project. You’ve learned to identify HTML elements on a page, identify their id and class attributes, and use this knowledge to extract information via both automated browsing with Splinter and HTML parsing with Beautiful Soup. You’ve also learned to scrape various types of information. These include HTML tables and recurring elements, like multiple news articles on a webpage.

As you work on this Challenge, remember that you’re strengthening the same core skills that you’ve been developing until now: collecting data, organizing and storing data, analyzing data, and then visually communicating your insights.

What You're Creating This new assignment consists of two technical products. You will submit the following deliverables:

Deliverable 1: Scrape titles and preview text from Mars news articles.

Deliverable 2: Scrape and analyze Mars weather data, which exists in a table.

Instructions Part 1: Scrape Titles and Preview Text from Mars News -Open the Jupyter Notebook in the starter code folder named part_1_mars_news.ipynb. You will work in this code as you follow the steps below to scrape the Mars News website. -Use automated browsing to visit the Mars news siteLinks to an external site.. Inspect the page to identify which elements to scrape. -Extract the titles and preview text of the news articles that you scraped. Store the scraping results in Python data structures as follows: -Store each title-and-preview pair in a Python dictionary and, give each dictionary two keys: title and preview. An example is the following: -{'title': "NASA's MAVEN Observes Martian Light Show Caused by Major Solar Storm", 'preview': "For the first time in its eight years orbiting Mars, NASA’s MAVEN mission witnessed two different types of ultraviolet aurorae simultaneously, the result of solar storms that began on Aug. 27."} -Store all the dictionaries in a Python list. -Print the list in your notebook. -Optionally, store the scraped data in a file (to ease sharing the data with others). To do so, export the scraped data to a JSON file. (Note: there will be no extra points for completing this.)

Part 2: Scrape and Analyze Mars Weather Data -Open the Jupyter Notebook in the starter code folder named part_2_mars_weather.ipynb. You will work in this code as you follow the steps below to scrape and analyze Mars weather data. -Use automated browsing to visit the Mars Temperature Data SiteLinks to an external site.. Inspect the page to identify which elements to scrape. Note that the URL is https://static.bc-edx.com/data/web/mars_facts/temperature.html. -Assemble the scraped data into a Pandas DataFrame. The columns should have the same headings as the table on the website. Here’s an explanation of the column headings:


  1. id: the identification number of a single transmission from the Curiosity rover terrestrial_date: the date on Earth
  2. sol: the number of elapsed sols (Martian days) since Curiosity landed on Mars
  3. ls: the solar longitude
  4. month: the Martian month
  5. min_temp: the minimum tempereture, in Celsius, of a single Martian day (sol)
  6. pressure: The atmospheric pressure at Curiosity's location

-Examine the data types that are currently associated with each column. If necessary, cast (or convert) the data to the appropriate datetime, int, or float data types. -Analyze your dataset by using Pandas functions to answer the following questions: -How many months exist on Mars? -How many Martian (and not Earth) days worth of data exist in the scraped dataset? -What are the coldest and the warmest months on Mars (at the location of Curiosity)? To answer this question: -Find the average minimum daily temperature for all of the months. -Plot the results as a bar chart. -Which months have the lowest and the highest atmospheric pressure on Mars? To answer this question: -Find the average daily atmospheric pressure of all the months. -Plot the results as a bar chart. -About how many terrestrial (Earth) days exist in a Martian year? To answer this question: -Consider how many days elapse on Earth in the time that Mars circles the Sun once. -Visually estimate the result by plotting the daily minimum temperature. -Export the DataFrame to a CSV file.