JavoJavo / ecotec-catalog-webscrapping

Webscrapping stove information from cleancooking.org with selenium (python) and lighting information from lighting.philips.com.mx and visualizing it with streamlit.

Home Page:https://javojavo-stoves-catalog-webs-streamlit-appvisualizations-dym14c.streamlit.app/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Streamlit App

ecotec-catalog-webscrapping

Ecotecs pending from scrapping ...

Visualizations

https://javojavo-stoves-catalog-webs-streamlit-appvisualizations-dym14c.streamlit.app/

DEMO visualizations

demo1 demo2 demo3 demo4 demo5

Resources

Steps for webscrapping stove catalog

  1. Download the file capture_catalog.py.
  2. Check your google chrome version.
  3. Download chrome webdriver with the same version as your google chrome version, preferably store it in the same directory as capture_catalog.py. Unzip it.
  4. If you saved the chrome webdriver in another directory add its path at line 70 of the capture_catalog.py file, where the variable driver is initialized.
  5. Run capture_catalog.py and zoom out to the max when the new window pops up (depending on the size of your screen, some elements may not be available to click on if the zoom is in its default value).

Results

  • Some stoves contained double quotes, so that messed the resulting csv. At the moment that was handled manually and one instance was deleted completely because it couldn't be made sense of.

Lighting catalog (philips)

Scrapped using scrap_phillips_2.ipynb.

Lighting catalog (home depot)

Scrapped using homedepot_lighting_scrapping.ipynb.

TODO

  1. Handle the double quotes so it doesn't mess the resulting csv file.
  2. Check empty fields and add an error label to the csv.
  3. Fix bugs that prevent existing fields from being captured if they exist.
  4. Search for unadded fields that could be present later on on the stoves, but because they were not present on the first stoves (where the program was based on) they were omitted.
  5. Develop visualizations, maybe use streamlit.
  6. Remove repeated columns. Check out why they are repeated and make sure no data is lost.
  7. Add demo visualizations here on the README.
  8. Add more ecotecs.

About

Webscrapping stove information from cleancooking.org with selenium (python) and lighting information from lighting.philips.com.mx and visualizing it with streamlit.

https://javojavo-stoves-catalog-webs-streamlit-appvisualizations-dym14c.streamlit.app/


Languages

Language:Jupyter Notebook 98.1%Language:Python 1.9%