Christytky / StreamWebScraping

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Stream Web Scraping

The Project is to a 4 days project on web-scraping, to derive insights from data, presenting it to our imaginary client. It is our first project after a month of first hand experience on Python language, Pandas, Seaborn, Matplotlib, BeautifulSoup, and API Requests. We did plan a 4-days project flow, the flow of the project was well respected while the timeline is adjusted according to challenges. Web-scraping challenges is real, data-cleaning is a challenge. It is now a reminder that data sourcing and data cleaning had taken 85% of our time as a team.

The Project Responsibilities. Initially we explored different ways to source information via get API on Steam Online Game website. On focus, David is responsible for the main framework of the code such as Get API. Mandy manages the project throughout and reminds us with our progress, also on deriving data insights for better business decision, myself Hui-Ee has get together information to create a compelling presentation. I dive deep into our project flow to study the formation of Steam ID, increasing our project flow and accuracy in data source.

My group members are David Chueng [cheungyuk123@gmail.com], Mandy [mandy200525@gmail.com], and Hui Ee [huiee.wong@gmail.com]

The Project Challenge is to acquire a valid Steam ID (gamer ID) for gamer statistical information, as we have to input Steam ID to get the API key. Generating steam ID is a challenging part. The Steam ID itself has 17-digit random ID. By observing a pattern, the first 9-digits of ID are similar while the last 8 digits are random. By decoding the formation, we convert the Steam ID into 64-bit number and convert it back to 17-bit. Ultimately getting Steam ID with 99% account with valid gamer statistical information such as gamers played games, gamers location, total time spent, and money spent.


Project Framework -- Data Collection

Screenshot 2022-11-18 at 11 06 41 PM

Get Stream Community ID

Approach 1 --- generate random account numbers
  • by observation, valid steam IDs usually have a pattern (e.g. 76561198XXXXXXXXX)
  • "76561198" + randomly generate 9-digit numbers

Results --- Extremely low efficiency

  • < 1% of SteamIDs generated are valid
  • Time required to select valid id (~1 valid ID/ second)

Approach 2 --- generate the ID from its simple format (i.e. STEAM 1:1:66138017)

Transforming SteamID to 17-digit number

Screenshot 2022-11-18 at 11 28 56 PM

Meaning of the simple SteamID

Screenshot 2022-11-18 at 11 27 13 PM

About

License:MIT License


Languages

Language:Jupyter Notebook 100.0%