A web application that scrapes various websites for real-time data related to the Mission to Mars, stores scraped data in MongoDB, and displays the latest information in a single HTML page. Check out the web application here!
-
mission_to_mars.ipynb
: the Jupyter Notebook file that outlines all the scraping.- Scrape latest news title and paragraph text from the NASA Mars News Site.
- Scrape the image url for the current featured Mars Image from JPL Featured Space Image.
- Scrape the latest Mars weather tweet from the Mars Weather twitter account
- Scrape Mars metrics (e.g. diameter, mass) from the Mars Facts webpage.
- Scrape Mars hemispheres images and names from the USGS Astrogeology site.
-
scrape_mars.py
: declares a function calledscrape
that executes all the above scraping and returns the scraped data. -
app.py
: creates an app route called/scrape
that calls thescrape
function and store data in Mongo database; creates a root route/
that queries the Mongo database and pass the mars data into an HTML template to display the data. -
templates
index.html
: a template HTML file that display all data in the appropriate HTML elements.
-
static
css
reset.css
,style.css
: CSS stylesheets.
- Clone this repository.
- Make sure MongoDB is installed. Here is installation instruction from Northwestern Data Science Bootcamp.
- In terminal (or command line in Windows), type
mongod
- Open another terminal window, navigate to
mars-web-scraping/
directory (where this README file is located). - In the terminal, type
python app.py
. The webpage will display on local browser.
The code was developed using the Anaconda distribution of Python version 3.6. The following dependencies were used.
pandas
Flask
BeautifulSoup
splinter
Flask-PyMongo