Creator: Seok Yim (Noah)
Do you want to be the PIONEER of soon-to-POP-OFF games? Then you're gonna like this...
Title: Steam Data Pipeline
A data pipeline that regularly scrapes, cleans, stores, and publishes data for newly released games on Steam. The data visualization is taken care of by Apache Superset (publicly accessible).
*** Preview ***
Website link:
http://18.212.126.33:8080/superset/dashboard/1/?standalone=3&show_filters=1
Authentication for anonymous users (Anyone can view it with these credentials):
ID: public
password: public
I frequently saw websites/projects with Steam-related data for popular(top 100) games but never saw one primarily focused on new releases on Steam. Thus, I decided to make one myself.
- Python, MYSQL, AWS(EC2, RDS), Docker, Scrapy, Apache Superset, Selenium
- Created a Scrapy project that scrapes data from the official Steam website (https://store.steampowered.com/search/?sort_by=Released_DESC&supportedlang=english).
- Added selenium to deal with infinite scrolling. Created a Python scheduler with Apscheulder along with Python asyncio.
- Launched an EC2 and RDS instance, each for persisting the program and running the MYSQL database, respectively.
- Created a Docker image that downloads the Python dependencies along with the Chrome browser.
- On EC2, initialized the containerized project along with the containerized Apache Superset image.
- Made the dashboard publicly available.
- Contains visual expressions of the data that facilitate individuals in understanding the latest trends in games.