jonbakerfish / TweetScraper

TweetScraper is a simple crawler/spider for Twitter Search without using API

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Throws Error

gautampal1947 opened this issue · comments

Running the code gives the following error:

2020-09-30 09:49:20 [twisted] CRITICAL:
Traceback (most recent call last):
File "/opt/anaconda3/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "/opt/anaconda3/lib/python3.7/site-packages/scrapy/crawler.py", line 87, in crawl
self.engine = self._create_engine()
File "/opt/anaconda3/lib/python3.7/site-packages/scrapy/crawler.py", line 101, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "/opt/anaconda3/lib/python3.7/site-packages/scrapy/core/engine.py", line 69, in init
self.downloader = downloader_cls(crawler)
File "/opt/anaconda3/lib/python3.7/site-packages/scrapy/core/downloader/init.py", line 83, in init
self.middleware = DownloaderMiddlewareManager.from_crawler(crawler)
File "/opt/anaconda3/lib/python3.7/site-packages/scrapy/middleware.py", line 53, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/opt/anaconda3/lib/python3.7/site-packages/scrapy/middleware.py", line 35, in from_settings
mw = create_instance(mwcls, settings, crawler)
File "/opt/anaconda3/lib/python3.7/site-packages/scrapy/utils/misc.py", line 146, in create_instance
return objcls.from_crawler(crawler, *args, **kwargs)
File "/opt/anaconda3/lib/python3.7/site-packages/scrapy_selenium/middlewares.py", line 71, in from_craw
ler
browser_executable_path=browser_executable_path
File "/opt/anaconda3/lib/python3.7/site-packages/scrapy_selenium/middlewares.py", line 51, in init
self.driver = driver_klass(**driver_kwargs)
File "/opt/anaconda3/lib/python3.7/site-packages/selenium/webdriver/firefox/webdriver.py", line 174, in
init
keep_alive=True)
File "/opt/anaconda3/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 157, in
init
self.start_session(capabilities, browser_profile)
File "/opt/anaconda3/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in response = self.execute(Command.NEW_SESSION, parameters)
File "/opt/anaconda3/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in
execute
self.error_handler.check_response(response)
File "/opt/anaconda3/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 242,
in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.SessionNotCreatedException: Message: Unable to find a matching set of capabilities

I'm working on WIndows
ad line to find my driver: webdriver.Firefox(executable_path=r'C:\webdrivers\geckodriver.exe') (ork with firefox)

(as sugessted in) https://stackoverflow.com/questions/40208051/selenium-using-python-geckodriver-executable-needs-to-be-in-path

I've got Error:
2020-09-30 22:32:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-09-30 22:32:19 [scrapy.extensions.telnet] INFO: Telnet console listening on ****
2020-09-30 22:32:20 [scrapy.core.scraper] ERROR: Spider error processing <GET https://twitter.com/explore> (referer: None)
Traceback (most recent call last):
File "C:\Users\Pc\miniconda3\envs\tweetscraper\lib\site-packages\scrapy\utils\defer.py", line 120, in iter_errback
yield next(it)
File "C:\Users\Pc\miniconda3\envs\tweetscraper\lib\site-packages\scrapy\utils\python.py", line 347, in next
return next(self.data)
File "C:\Users\Pc\miniconda3\envs\tweetscraper\lib\site-packages\scrapy\utils\python.py", line 347, in next
return next(self.data)
File "C:\Users\Pc\miniconda3\envs\tweetscraper\lib\site-packages\scrapy\core\spidermw.py", line 64, in _evaluate_iterable
for r in iterable:
File "C:\Users\Pc\miniconda3\envs\tweetscraper\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 29, in process_spider_output
for x in result:
File "C:\Users\Pc\miniconda3\envs\tweetscraper\lib\site-packages\scrapy\core\spidermw.py", line 64, in _evaluate_iterable
for r in iterable:
File "C:\Users\Pc\miniconda3\envs\tweetscraper\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 340, in
return (_set_referer(r) for r in result or ())
File "C:\Users\Pc\miniconda3\envs\tweetscraper\lib\site-packages\scrapy\core\spidermw.py", line 64, in _evaluate_iterable
for r in iterable:
File "C:\Users\Pc\miniconda3\envs\tweetscraper\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 37, in
return (r for r in result or () if _filter(r))
File "C:\Users\Pc\miniconda3\envs\tweetscraper\lib\site-packages\scrapy\core\spidermw.py", line 64, in _evaluate_iterable
for r in iterable:
File "C:\Users\Pc\miniconda3\envs\tweetscraper\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 58, in
return (r for r in result or () if _filter(r))
File "C:\Users\Pc\miniconda3\envs\tweetscraper\lib\site-packages\scrapy\core\spidermw.py", line 64, in _evaluate_iterable
for r in iterable:
File "C:\Users\Pc\Desktop\TweetScraper-master\TweetScraper\spiders\TweetCrawler.py", line 71, in parse_home_page
self.update_cookies(response)
File "C:\Users\Pc\Desktop\TweetScraper-master\TweetScraper\spiders\TweetCrawler.py", line 78, in update_cookies
driver = response.meta['driver']
KeyError: 'driver'

How to deal with it?

commented

@ASIAMI I'm not quite familiar with the difference between win & *nix for selenium & scrapy. If you are working on win 10, you can try this out on the WSL. Then, you can inspect the response to see what is the difference using inspect_response

OK, so according to my key error 'driver' on windows 10

step one: download geckodriver for firefox
2: store it in some file in C:/
3. Then go to system variables,
4 add path to this file AND must name it geckodriver - then everything is OK