anilabhadatta / educative.io_scraper

Educative.io Course Downloader developed using Python and Selenium. Refer Readme.md for setup instructions.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

failure

abhinavm24 opened this issue · comments

log

` 2023-11-26 10:47:32,159 - INFO - ApiUtility - Course URL Selector: //a[contains(@href, 'courses/distributed-systems-practitioners/')]
2023-11-26 10:47:32,185 - INFO - LoginAccount - Checking if logged in...
2023-11-26 10:47:32,194 - INFO - ApiUtility - Getting Course Collections JSON from URL: https://educative.io/api/collection/10370001/4891237377638400?work_type=collection
2023-11-26 10:47:32,194 - INFO - ApiUtility - Executing JS to get JSON from URL
2023-11-26 10:47:32,946 - INFO - ExtensionScraper - API Urls: 166 == 166 :Topic Urls
2023-11-26 10:47:32,947 - INFO - ExtensionScraper - ----------------------------------------------------------------------------------
Scraping Topic: 165-some more things to discover: https://www.educative.io/courses/distributed-systems-practitioners/some-more-things-to-discover?showContent=true

2023-11-26 10:47:32,954 - INFO - LoginAccount - Checking if logged in...
2023-11-26 10:47:32,961 - INFO - ApiUtility - Getting Course API Content JSON from URL: https://educative.io/api/collection/10370001/4891237377638400/page/6134167328260096?work_type=collection
2023-11-26 10:47:32,962 - INFO - ApiUtility - Executing JS to get JSON from URL
2023-11-26 10:47:33,008 - INFO - ApiUtility - Successfully fetched JSON API data
2023-11-26 10:47:33,008 - INFO - OSUtility - Sleeping for 10 seconds
2023-11-26 10:47:46,328 - INFO - SeleniumBasicUtility - Loading page and checking if something went wrong
2023-11-26 10:47:46,328 - INFO - OSUtility - Sleeping for 10 seconds
2023-11-26 10:47:56,397 - INFO - SeleniumBasicUtility - Waiting for webdriver to load topic page
2023-11-26 10:47:56,418 - INFO - SeleniumBasicUtility - Adding name attribute in next/back button
2023-11-26 10:47:56,422 - INFO - BrowserUtility - Scrolling Page
2023-11-26 10:47:56,808 - INFO - OSUtility - Sleeping for 2 seconds
2023-11-26 10:47:58,811 - INFO - RemoveUtility - Removing blur with CSS
2023-11-26 10:47:58,837 - INFO - RemoveUtility - Removing mark-as-completed/completed tick mark
2023-11-26 10:47:58,900 - INFO - RemoveUtility - Removing unwanted elements
2023-11-26 10:47:58,905 - INFO - ShowUtility - Showing single markdown quiz solution
2023-11-26 10:47:58,909 - INFO - ShowUtility - No single markdown quiz solution found
2023-11-26 10:47:58,910 - INFO - ShowUtility - Showing code solutions
2023-11-26 10:47:58,914 - INFO - ShowUtility - No code solution found
2023-11-26 10:47:58,914 - INFO - ShowUtility - Showing hints
2023-11-26 10:47:58,917 - INFO - ShowUtility - No hints found
2023-11-26 10:47:58,918 - INFO - ShowUtility - Showing slides
2023-11-26 10:47:58,920 - INFO - ShowUtility - No slides found
2023-11-26 10:47:58,921 - INFO - SingleFileUtility - Fixing all object tags
2023-11-26 10:47:58,924 - INFO - SingleFileUtility - No object tag found
2023-11-26 10:47:58,924 - INFO - SingleFileUtility - Injecting important scripts
2023-11-26 10:47:58,931 - INFO - OSUtility - Sleeping for 5 seconds
2023-11-26 10:48:03,951 - INFO - OSUtility - Sleeping for 5 seconds
2023-11-26 10:48:08,954 - INFO - SingleFileUtility - Making code selectable
2023-11-26 10:48:08,964 - INFO - SingleFileUtility - No code found
2023-11-26 10:48:08,965 - INFO - SingleFileUtility - getSingleFileHtml: Getting SingleFile Html...
2023-11-26 10:48:10,015 - INFO - ExtensionScraper - Topic File Successfully Created
2023-11-26 10:48:10,015 - INFO - ExtensionScraper - Downloading Code and Quiz Files if found...
2023-11-26 10:48:10,015 - INFO - ExtensionScraper - Code and Quiz Files Downloaded if found.
2023-11-26 10:48:10,092 - INFO - ExtensionScraper - Started Scraping from Text File URL: ?showContent=true
2023-11-26 10:48:10,092 - INFO - BrowserUtility - Loading Browser...
2023-11-26 10:48:12,555 - INFO - BrowserUtility - Browser Initiated
2023-11-26 10:48:12,631 - ERROR - StartScraper - start: 20: ExtensionScraper:start: 52: ExtensionScraper:scrapeCourse: 64: ApiUtility:getCourseUrl: 131: Message: invalid argument
(Session info: chrome=116.0.5845.96)
Stacktrace:
0 chromedriver 0x00000001007da65c chromedriver + 4318812
1 chromedriver 0x00000001007d2d00 chromedriver + 4287744
2 chromedriver 0x0000000100404644 chromedriver + 296516
3 chromedriver 0x00000001003ec430 chromedriver + 197680
4 chromedriver 0x00000001003e9fe0 chromedriver + 188384
5 chromedriver 0x00000001003eaafc chromedriver + 191228
6 chromedriver 0x00000001004067d4 chromedriver + 305108
7 chromedriver 0x000000010047b380 chromedriver + 783232
8 chromedriver 0x000000010047ad28 chromedriver + 781608
9 chromedriver 0x0000000100436178 chromedriver + 500088
10 chromedriver 0x0000000100436fc0 chromedriver + 503744
11 chromedriver 0x000000010079ac40 chromedriver + 4058176
12 chromedriver 0x000000010079f160 chromedriver + 4075872
13 chromedriver 0x0000000100762e68 chromedriver + 3829352
14 chromedriver 0x000000010079fc4c chromedriver + 4078668
15 chromedriver 0x0000000100777f08 chromedriver + 3915528
16 chromedriver 0x00000001007bc140 chromedriver + 4194624
17 chromedriver 0x00000001007bc2c4 chromedriver + 4195012
18 chromedriver 0x00000001007cc4d0 chromedriver + 4261072
19 libsystem_pthread.dylib 0x0000000187ec1034 _pthread_start + 136
20 libsystem_pthread.dylib 0x0000000187ebbe3c thread_start + 8`

Runs fine till first till this page but keeps failing onwards

@abhinavm24 your text file might contain a blank url. Please upload the text file url list and the full log file here.

@abhinavm24 https://www.educative.io/courses/distributed-systems-practitioners/some-more-things-to-discover?showContent=true

This is actually the last url of that course so after that new course url is fetched from text file.
Unfortunately you may have a blank link in between.
Url is empty after viewing the log data.
Started Scraping from Text File URL: ?showContent=true

yep, realized that later