How to perform a click button with scrapy-selenium?

Question

How to perform a click button with scrapy-selenium?

Houssemaster opened this issue 4 years ago · comments

Safar Houssem Eddine commented 4 years ago

Hello, i want to make some actions after getting response from page like clicking, hovering scrolling etc..

Alexis Rodriguez · Answer 1 · Sat Jan 30 2021 05:27:06 GMT+0800 (China Standard Time)

Requests have an additional meta key, named driver containing the selenium driver with the request processed.
You can perform those actions with it like:

class WhateverSpider(scrapy.Spider):
	def start_requests(self):
		urls = ['www.google.com']
		for url in urls:
			yield SeleniumRequest(
				url = url,
				callback = self.parse,
				wait_time = 10)

	def parse(self, response):
		driver = response.request.meta['driver']
		# Do some stuff..
		# Click a button. 
		button = driver.get_element_by_xpath( '//*[@id="clickable-button-foo"]')
		button.click()		
		# Do more stuff

Roger Lin · Answer 2 · Fri Feb 05 2021 04:21:49 GMT+0800 (China Standard Time)

Requests have an additional meta key, named driver containing the selenium driver with the request processed.
You can perform those actions with it like:

class WhateverSpider(scrapy.Spider):
	def start_requests(self):
		urls = ['www.google.com']
		for url in urls:
			yield SeleniumRequest(
				url = url,
				callback = self.parse,
				wait_time = 10)

	def parse(self, response):
		driver = response.request.meta['driver']
		# Do some stuff..
		# Click a button. 
		button = driver.get_element_by_xpath( '//*[@id="clickable-button-foo"]')
		button.click()		
		# Do more stuff

Hello, I think your solution solved part of the problem. However, there is still a problem with this snippet of code since downloading requests and parsing responses are asynchronous in scrapy. Thus, it is possible that scrapy invoked

driver.get(another_url)

in the middleware's process_request method before scrapy reaching the line:

driver.get_element_by_xpath( '//*[@id="clickable-button-foo"]')

which means at the time scrapy reached that line, the page source may have been changed.

zjone.jj · Answer 3 · Fri Mar 19 2021 14:58:41 GMT+0800 (China Standard Time)

This will case some problem, while the code are asynchronous.

But there is another solution.
You could use the request option wait_until to perform some action like this:

def some_action(driver):
    if wait_until_conditions:
        driver.find_element(By.CLASS_NAME, '.klass')
        ……
       return True

SeleniumRequest(
            url='http://xxx.ofg',
            wait_until=some_action
        )

# if you forget to return True in wait_until callback, This code would run again and again.

zjone.jj · Answer 4 · Fri Mar 19 2021 17:55:45 GMT+0800 (China Standard Time)

Hello, i want to make some actions after getting response from page like clicking, hovering scrolling etc..

I have the same requirement.
you can check this repo before the pull request accepted.

xtan9 · Answer 5 · Thu Aug 19 2021 03:15:43 GMT+0800 (China Standard Time)

Requests have an additional meta key, named driver containing the selenium driver with the request processed.
You can perform those actions with it like:
class WhateverSpider(scrapy.Spider):
	def start_requests(self):
		urls = ['www.google.com']
		for url in urls:
			yield SeleniumRequest(
				url = url,
				callback = self.parse,
				wait_time = 10)

	def parse(self, response):
		driver = response.request.meta['driver']
		# Do some stuff..
		# Click a button. 
		button = driver.get_element_by_xpath( '//*[@id="clickable-button-foo"]')
		button.click()		
		# Do more stuff
Hello, I think your solution solved part of the problem. However, there is still a problem with this snippet of code since downloading requests and parsing responses are asynchronous in scrapy. Thus, it is possible that scrapy invoked

driver.get(another_url)

in the middleware's process_request method before scrapy reaching the line:

driver.get_element_by_xpath( '//*[@id="clickable-button-foo"]')

which means at the time scrapy reached that line, the page source may have been changed.

You are right. There is only one drive. So response.request.meta['driver'] is dealing with the current url which is different from response.url. See #22
Any solution to this?

ppeer · Answer 6 · Fri Jan 07 2022 00:05:25 GMT+0800 (China Standard Time)

get_element_by_xpath change to find_element_by_xpath