Scraping Tested Comic Episodes on Webtoons using Python and Selenium
The scripts output will contain the following columns:
- Episode Name
- Date
- Loves
- Episode Number
- Comments Count
- Comment Username
- Comment Description
- Comment Likes
- Comment Dislikes
- Reply Username
- Reply Description
- Reply Likes
- Reply Dislikes
How to Run the Script on Windows
Clone the repository to your system as a ZIP File
Click the arrow on the folder and click "Show in folder"
Right click the ZIP file and click "Extract All..."
Input your desired directory to save the folder (you will need this later)
Click "Extract"
Click Here to download Python3.8 (requires Python3.8 or lower)
Click to open the installer
Check the "Add Python 3.8 to PATH" box then click "Install Now"
Once the installation is complete, press the "Windows" key and search for Command Prompt by typing "CMD"
Click "Open" to open the Command Prompt
Type "python --version" and press enter to verify python is installed and in PATH
Navigate to the extracted folder using the "cd" command: Type "cd C:\YOUR\DIRECTORY\HERE\webtoons-comments-in-python-main" and press enter
Use the "dir" command to veridy you're in the correct folder
Run the command "py -m pip install -r requirements.txt" to install all of the required dependencies
Wait for the installations to complete, then run the command "python webtoons_scraping.py" to execute the script
The script will display data being actively scraped until the eventual message "EXECUTION COMPLETE"
After execution, 2 output files will appear in the directory, one in CSV and one in XLSX format
We checked robots.txt file of the URL: https://www.webtoons.com/en/challenge/tested/list?title_no=231173&page=1 and learned that we are allowed to scrape comic data.