anilabhadatta / educative.io_scraper Course Downloader developed using Python and Selenium. Refer for setup instructions.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool


abhinavm24 opened this issue · comments


` 2023-11-26 10:47:32,159 - INFO - ApiUtility - Course URL Selector: //a[contains(@href, 'courses/distributed-systems-practitioners/')]
2023-11-26 10:47:32,185 - INFO - LoginAccount - Checking if logged in...
2023-11-26 10:47:32,194 - INFO - ApiUtility - Getting Course Collections JSON from URL:
2023-11-26 10:47:32,194 - INFO - ApiUtility - Executing JS to get JSON from URL
2023-11-26 10:47:32,946 - INFO - ExtensionScraper - API Urls: 166 == 166 :Topic Urls
2023-11-26 10:47:32,947 - INFO - ExtensionScraper - ----------------------------------------------------------------------------------
Scraping Topic: 165-some more things to discover:

2023-11-26 10:47:32,954 - INFO - LoginAccount - Checking if logged in...
2023-11-26 10:47:32,961 - INFO - ApiUtility - Getting Course API Content JSON from URL:
2023-11-26 10:47:32,962 - INFO - ApiUtility - Executing JS to get JSON from URL
2023-11-26 10:47:33,008 - INFO - ApiUtility - Successfully fetched JSON API data
2023-11-26 10:47:33,008 - INFO - OSUtility - Sleeping for 10 seconds
2023-11-26 10:47:46,328 - INFO - SeleniumBasicUtility - Loading page and checking if something went wrong
2023-11-26 10:47:46,328 - INFO - OSUtility - Sleeping for 10 seconds
2023-11-26 10:47:56,397 - INFO - SeleniumBasicUtility - Waiting for webdriver to load topic page
2023-11-26 10:47:56,418 - INFO - SeleniumBasicUtility - Adding name attribute in next/back button
2023-11-26 10:47:56,422 - INFO - BrowserUtility - Scrolling Page
2023-11-26 10:47:56,808 - INFO - OSUtility - Sleeping for 2 seconds
2023-11-26 10:47:58,811 - INFO - RemoveUtility - Removing blur with CSS
2023-11-26 10:47:58,837 - INFO - RemoveUtility - Removing mark-as-completed/completed tick mark
2023-11-26 10:47:58,900 - INFO - RemoveUtility - Removing unwanted elements
2023-11-26 10:47:58,905 - INFO - ShowUtility - Showing single markdown quiz solution
2023-11-26 10:47:58,909 - INFO - ShowUtility - No single markdown quiz solution found
2023-11-26 10:47:58,910 - INFO - ShowUtility - Showing code solutions
2023-11-26 10:47:58,914 - INFO - ShowUtility - No code solution found
2023-11-26 10:47:58,914 - INFO - ShowUtility - Showing hints
2023-11-26 10:47:58,917 - INFO - ShowUtility - No hints found
2023-11-26 10:47:58,918 - INFO - ShowUtility - Showing slides
2023-11-26 10:47:58,920 - INFO - ShowUtility - No slides found
2023-11-26 10:47:58,921 - INFO - SingleFileUtility - Fixing all object tags
2023-11-26 10:47:58,924 - INFO - SingleFileUtility - No object tag found
2023-11-26 10:47:58,924 - INFO - SingleFileUtility - Injecting important scripts
2023-11-26 10:47:58,931 - INFO - OSUtility - Sleeping for 5 seconds
2023-11-26 10:48:03,951 - INFO - OSUtility - Sleeping for 5 seconds
2023-11-26 10:48:08,954 - INFO - SingleFileUtility - Making code selectable
2023-11-26 10:48:08,964 - INFO - SingleFileUtility - No code found
2023-11-26 10:48:08,965 - INFO - SingleFileUtility - getSingleFileHtml: Getting SingleFile Html...
2023-11-26 10:48:10,015 - INFO - ExtensionScraper - Topic File Successfully Created
2023-11-26 10:48:10,015 - INFO - ExtensionScraper - Downloading Code and Quiz Files if found...
2023-11-26 10:48:10,015 - INFO - ExtensionScraper - Code and Quiz Files Downloaded if found.
2023-11-26 10:48:10,092 - INFO - ExtensionScraper - Started Scraping from Text File URL: ?showContent=true
2023-11-26 10:48:10,092 - INFO - BrowserUtility - Loading Browser...
2023-11-26 10:48:12,555 - INFO - BrowserUtility - Browser Initiated
2023-11-26 10:48:12,631 - ERROR - StartScraper - start: 20: ExtensionScraper:start: 52: ExtensionScraper:scrapeCourse: 64: ApiUtility:getCourseUrl: 131: Message: invalid argument
(Session info: chrome=116.0.5845.96)
0 chromedriver 0x00000001007da65c chromedriver + 4318812
1 chromedriver 0x00000001007d2d00 chromedriver + 4287744
2 chromedriver 0x0000000100404644 chromedriver + 296516
3 chromedriver 0x00000001003ec430 chromedriver + 197680
4 chromedriver 0x00000001003e9fe0 chromedriver + 188384
5 chromedriver 0x00000001003eaafc chromedriver + 191228
6 chromedriver 0x00000001004067d4 chromedriver + 305108
7 chromedriver 0x000000010047b380 chromedriver + 783232
8 chromedriver 0x000000010047ad28 chromedriver + 781608
9 chromedriver 0x0000000100436178 chromedriver + 500088
10 chromedriver 0x0000000100436fc0 chromedriver + 503744
11 chromedriver 0x000000010079ac40 chromedriver + 4058176
12 chromedriver 0x000000010079f160 chromedriver + 4075872
13 chromedriver 0x0000000100762e68 chromedriver + 3829352
14 chromedriver 0x000000010079fc4c chromedriver + 4078668
15 chromedriver 0x0000000100777f08 chromedriver + 3915528
16 chromedriver 0x00000001007bc140 chromedriver + 4194624
17 chromedriver 0x00000001007bc2c4 chromedriver + 4195012
18 chromedriver 0x00000001007cc4d0 chromedriver + 4261072
19 libsystem_pthread.dylib 0x0000000187ec1034 _pthread_start + 136
20 libsystem_pthread.dylib 0x0000000187ebbe3c thread_start + 8`

Runs fine till first till this page but keeps failing onwards

@abhinavm24 your text file might contain a blank url. Please upload the text file url list and the full log file here.


This is actually the last url of that course so after that new course url is fetched from text file.
Unfortunately you may have a blank link in between.
Url is empty after viewing the log data.
Started Scraping from Text File URL: ?showContent=true

yep, realized that later