clemfromspace / scrapy-selenium

Scrapy middleware to handle javascript pages using selenium

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

No module named 'selenium.settings'

landowark opened this issue · comments

I have been trying to get started with the tutorial at https://www.geeksforgeeks.org/scraping-javascript-enabled-websites-using-scrapy-selenium/, and as far as I can tell I've followed the examples correctly, but when I run scrapy crawl integratedspider I get the following error:

Traceback (most recent call last):
File "/opt/anaconda/envs/scrapy_selenium/bin/scrapy", line 8, in
sys.exit(execute())
File "/opt/anaconda/envs/scrapy_selenium/lib/python3.7/site-packages/scrapy/cmdline.py", line 114, in execute
settings = get_project_settings()
File "/opt/anaconda/envs/scrapy_selenium/lib/python3.7/site-packages/scrapy/utils/project.py", line 69, in get_project_settings
settings.setmodule(settings_module_path, priority='project')
File "/opt/anaconda/envs/scrapy_selenium/lib/python3.7/site-packages/scrapy/settings/init.py", line 287, in setmodule
module = import_module(module)
File "/opt/anaconda/lib/python3.7/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1006, in _gcd_import
File "", line 983, in _find_and_load
File "", line 965, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'selenium.settings'

added to bottom of settings.py

from shutil import which

SELENIUM_DRIVER_NAME = 'chrome'
SELENIUM_DRIVER_EXECUTABLE_PATH = which('chromedriver')
SELENIUM_DRIVER_ARGUMENTS = ['--headless']

DOWNLOADER_MIDDLEWARES = {
    'scrapy_selenium.SeleniumMiddleware': 800

integratedspider.py

import scrapy
from scrapy_selenium import SeleniumRequest


class IntegratedspiderSpider(scrapy.Spider):
    name = 'integratedspider'

    def start_requests(self):
        yield SeleniumRequest(
            url="https://practice.geeksforgeeks.org/courses/online",
            wait_time=3,
            screenshot=True,
            callback=self.parse,
            dont_filter=True
        )

    def parse(self, response):
        # courses make list of all items that came in this xpath
        # this xpath is of cards containing courses details
        courses = response.xpath('//*[@id ="active-courses-content"]/div/div/div')

        # course is each course in the courses list
        for course in courses:
            # xpath of course name is added in the course path
            # text() will scrape text from h4 tag that contains course name
            course_name = course.xpath('.//a/div[2]/div/div[2]/h4/text()').get()

            # course_name is a string containing \n and extra spaces
            # these \n and extra spaces are removed

            course_name = course_name.split('\n')[1]
            course_name = course_name.strip()

            yield {
                'course Name': course_name
            }

Any help would be appreciated. Thanks

It looks like scrapy cannot load the settings module because it can't find it.

How did you named your module ?
You should not name it selenium as it's a module name that already exists.

Is the settings.py file there ?

In the exemple GeeksforGeeks exemple, their module name is scrapyselenium so the settings module is scrapyselenium.settings. The settings module is mandatory for scrapy projects.

Okay. I created a new project and renamed everything and it works. Total noob mistake on my part. Sorry. Thanks for your help.