clemfromspace / scrapy-selenium

Scrapy middleware to handle javascript pages using selenium

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TypeError: WebDriver.__init__() got an unexpected keyword argument 'executable_path'

GANGHSUN opened this issue · comments

Ubuntu 22.04.3 LTS (Jammy Jellyfish) ARM64
Selenium 4.10.0
scrapy-selenium 0.0.7
Mozilla Firefox 115.0.2
geckodriver 0.33.0 ( 2023-07-11)

Configured as description, get error TypeError: WebDriver.init() got an unexpected keyword argument 'executable_path'

My spider.py

spider.py

import scrapy
from quotes_js_scraper.items import QuoteItem
from scrapy_selenium import SeleniumRequest
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

class QuotesSpider(scrapy.Spider):
name = 'quotes'

def start_requests(self):
    url = 'https://quotes.toscrape.com/js/'

    yield SeleniumRequest(url=url, callback=self.parse, 
        wait_time=10,
        wait_until=EC.element_to_be_clickable((By.CLASS_NAME, 'quote'))
        )

def parse(self, response):
    quote_item = QuoteItem()
    for quote in response.css('div.quote'):
        quote_item['text'] = quote.css('span.text::text').get()
        quote_item['author'] = quote.css('small.author::text').get()
        quote_item['tags'] = quote.css('div.tags a.tag::text').getall()
        yield quote_item

same issue...

I encountered the same issue and spent several hours devising a solution. Here's what I did to make it work:

  1. Install Python 3.8.0: I performed a custom installation on Windows, without adding to PATH, unchecking pip and all other options to avoid interfering with my current Python setup.

  2. Create a Virtual Environment (venv): I used the command python venv venv in the terminal. In the newly created 'venv' folder, there's a pyvenv.cfg file that needs to be modified with the following paths:

    home = C:\Users\User\AppData\Local\Programs\Python\Python38-32
    include-system-site-packages = false
    version = 3.8.0
    executable = C:\Users\User\AppData\Local\Programs\Python\Python38-32\python.exe
    command =C:\Users\User\AppData\Local\Programs\Python\Python38-32\python.exe -m venv C:\Users\User\Desktop\scraping\venv
    

    Make sure to set your own paths.

  3. Activate the venv: I used the command .\venv\Scripts\activate.

  4. Check Python Version: I ran python --version to ensure it was using Python 3.8.0.

  5. Install Necessary Packages: I installed pip, scrapy, scrapy_selenium, selenium (version 3.141.0), and urllib3 (version 1.25.11) using the following commands:

    python -m ensurepip
    python -m pip install scrapy
    python -m pip install scrapy_selenium
    pip install selenium==3.141.0
    pip install urllib3==1.25.11
    
  6. Download and Set Up Geckodriver: I downloaded geckodriver, created a folder for it, and added it to the environment variable PATH.

  7. Modify settings.py: I added the following lines to my settings.py file:

    from shutil import which
    SELENIUM_DRIVER_NAME = 'firefox'
    SELENIUM_DRIVER_EXECUTABLE_PATH = which('geckodriver')
    SELENIUM_DRIVER_ARGUMENTS=['-headless']
    DOWNLOADER_MIDDLEWARES = {
       'scrapy_selenium.SeleniumMiddleware': 800
    }
    

After these steps, I was able to successfully execute my Scrapy spider.
I hope this can help someone.

I encountered the same issue and spent several hours devising a solution. Here's what I did to make it work:

  1. Install Python 3.8.0: I performed a custom installation on Windows, without adding to PATH, unchecking pip and all other options to avoid interfering with my current Python setup.

  2. Create a Virtual Environment (venv): I used the command python venv venv in the terminal. In the newly created 'venv' folder, there's a pyvenv.cfg file that needs to be modified with the following paths:

    home = C:\Users\User\AppData\Local\Programs\Python\Python38-32
    include-system-site-packages = false
    version = 3.8.0
    executable = C:\Users\User\AppData\Local\Programs\Python\Python38-32\python.exe
    command =C:\Users\User\AppData\Local\Programs\Python\Python38-32\python.exe -m venv C:\Users\User\Desktop\scraping\venv
    

    Make sure to set your own paths.

  3. Activate the venv: I used the command .\venv\Scripts\activate.

  4. Check Python Version: I ran python --version to ensure it was using Python 3.8.0.

  5. Install Necessary Packages: I installed pip, scrapy, scrapy_selenium, selenium (version 3.141.0), and urllib3 (version 1.25.11) using the following commands:

    python -m ensurepip
    python -m pip install scrapy
    python -m pip install scrapy_selenium
    pip install selenium==3.141.0
    pip install urllib3==1.25.11
    
  6. Download and Set Up Geckodriver: I downloaded geckodriver, created a folder for it, and added it to the environment variable PATH.

  7. Modify settings.py: I added the following lines to my settings.py file:

    from shutil import which
    SELENIUM_DRIVER_NAME = 'firefox'
    SELENIUM_DRIVER_EXECUTABLE_PATH = which('geckodriver')
    SELENIUM_DRIVER_ARGUMENTS=['-headless']
    DOWNLOADER_MIDDLEWARES = {
       'scrapy_selenium.SeleniumMiddleware': 800
    }
    

After these steps, I was able to successfully execute my Scrapy spider. I hope this can help someone.

Works from me.

Because of the new Selenium version the executable_path has been deprecated, and now it should pass in a Service object.
My solution was change the init in file opt/conda/lib/python3.11/site-packages/scrapy_selenium/middlewares.py

def __init__(self, driver_name, driver_executable_path, driver_arguments,
        browser_executable_path):
        """Initialize the selenium webdriver

        Parameters
        ----------
        driver_name: str
            The selenium ``WebDriver`` to use
        driver_executable_path: str
            The path of the executable binary of the driver
        driver_arguments: list
            A list of arguments to initialize the driver
        browser_executable_path: str
            The path of the executable binary of the browser
        """
        webdriver_base_path = f'selenium.webdriver.{driver_name}'
        
        driver_klass_module = import_module('selenium.webdriver') 
        driver_klass = getattr(driver_klass_module, driver_name.capitalize())
        
        driver_service_module = import_module(f'{webdriver_base_path}.service')
        driver_service_klass = getattr(driver_service_module, 'Service')
        
        driver_options_module = import_module('selenium.webdriver')
        driver_options_klass = getattr(driver_options_module, driver_name.capitalize()+'Options')
        
        driver_options = driver_options_klass()
        if browser_executable_path:
            driver_options.binary_location = browser_executable_path
        for argument in driver_arguments:
            driver_options.add_argument(argument)
        
        service_kwargs = {
            'executable_path': driver_executable_path,
        }
        service = driver_service_klass(**service_kwargs)
        
        driver_kwargs = {
            'options': driver_options,
            'service': service
        }

        self.driver = driver_klass(**driver_kwargs)

What @turkievicz said and this happened in v4.10.0 of Selenium, so one alternative is to just downgrade Selenium specifically to 4.9.1 and it should work again.