scrapy / scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

Home Page:https://scrapy.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Issue with running scrapy spider from script.

tituskex opened this issue · comments

Hi, I'm trying to run scrapy from a script like this:

import scrapy
from scrapy.crawler import CrawlerProcess

class MySpider(scrapy.Spider):
    name = "basic"
    allowed_domains = ["web"]
    start_urls = ['http://www.example.com']

    def parse(self, response):
        l = ItemLoader(item=PropertiesItem(), response = response)
        l.add_xpath('title', '//h1[1]/text()')

        return l.load_item()
process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(MySpider)
process.start()

However, when I run this script I get the following error:

File "/Library/Python/2.7/site-packages/Twisted-16.7.0rc1-py2.7-macosx-10.11-
intel.egg/twisted/internet/_sslverify.py", line 38, in <module>
TLSVersion.TLSv1_1: SSL.OP_NO_TLSv1_1,
AttributeError: 'module' object has no attribute 'OP_NO_TLSv1_1'

Does anyone know how to fix this? Thanks in advance.

I would try to downgrade your twisted version from Twisted==16.7.0rc1 to Twisted==16.4.1. I got some weird errors too on the downloader part when I ran my Scrapy spiders with the same version you are running.

2017-01-02 14:25:00 [scrapy] ERROR: Error downloading <GET http://www.citysearch.com/profile/645344264/jackson_ms/wright_patrick_b_md_patrick_b_wright_md.html>
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1297, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/usr/local/lib/python2.7/dist-packages/twisted/python/failure.py", line 393, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/core/downloader/middleware.py", line 43, in process_request
    defer.returnValue((yield download_func(request=request,spider=spider)))
  File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/defer.py", line 45, in mustbe_deferred
    result = f(*args, **kw)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/core/downloader/handlers/__init__.py", line 65, in download_request
    return handler.download_request(request, spider)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/core/downloader/handlers/http11.py", line 60, in download_request
    return agent.download_request(request)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/core/downloader/handlers/http11.py", line 285, in download_request
    method, to_bytes(url, encoding='ascii'), headers, bodyproducer)
  File "/usr/local/lib/python2.7/dist-packages/twisted/web/client.py", line 1631, in request
    parsedURI.originForm)
  File "/usr/local/lib/python2.7/dist-packages/twisted/web/client.py", line 1408, in _requestWithEndpoint
    d = self._pool.getConnection(key, endpoint)
  File "/usr/local/lib/python2.7/dist-packages/twisted/web/client.py", line 1294, in getConnection
    return self._newConnection(key, endpoint)
  File "/usr/local/lib/python2.7/dist-packages/twisted/web/client.py", line 1306, in _newConnection
    return endpoint.connect(factory)
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/endpoints.py", line 779, in connect
    EndpointReceiver, self._hostText, portNumber=self._port
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/_resolver.py", line 174, in resolveHostName
    onAddress = self._simpleResolver.getHostByName(hostName)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/resolver.py", line 21, in getHostByName
    d = super(CachingThreadedResolver, self).getHostByName(name, timeout)
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/base.py", line 276, in getHostByName
    timeoutDelay = sum(timeout)
TypeError: 'float' object is not iterable

After, downgrading to the version I had(Twisted==16.4.1) things went back to working great again.

command: pip install Twisted==16.4.1
If you need sudo access then add it to your command.

#2479 This is related to this one as well.

@tituskex , did you manage to make it work?
did downgrading Twisted work?

Downgrading Twisted worked for me too.

@pembeci what is your Scrapy version?

@kmike The latest from pip install: 1.3.2. I am running on an old machine which is not upgraded for a while: Ubuntu 12.04 LTS - 32 bit
So may be that's why I needed to downgrade Twisted.

@pembeci what was the exception?
Hm, maybe it is caused by Twisted 17+ dropping pyOpenSSL < 0.16 support.

@pembeci I would recommend to use (mini)conda to have the latest releases without having to upgrade system libraries in old systems.

+1 . Same problem with scrapy (1.3.2) and twisted (17.1.0) .

  File "/Library/Python/2.7/site-packages/twisted/protocols/tls.py", line 63, in <module>
    from twisted.internet._sslverify import _setAcceptableProtocols
  File "/Library/Python/2.7/site-packages/twisted/internet/_sslverify.py", line 38, in <module>
    TLSVersion.TLSv1_1: SSL.OP_NO_TLSv1_1,
AttributeError: 'module' object has no attribute 'OP_NO_TLSv1_1'

@wzpan what is your pyOpenSSL version?

Twisted dropped support for pyOpenSSL < 16.0.0 in Twisted 16.4.0 release (see http://twistedmatrix.com/trac/ticket/8441); in fact it worked for some time, but they recently removed some of the supporting code as well. Is upgrading it an option? You can check pyOpenSSL version by running python -c 'import OpenSSL; print(OpenSSL.version.__version__)'

@kmike awesome! 👍
My pyOpenSSL version is 0.13.1. After upgrading it to 16.2.0, scrapy works like a charm!

I run into this problem, too.
Here is my stacktrace:

➜  ~ scrapy shell 'http://jbk.39.net/bw_t1/'
Traceback (most recent call last):
  File "/usr/local/bin/scrapy", line 7, in <module>
    from scrapy.cmdline import execute
  File "/Library/Python/2.7/site-packages/scrapy/cmdline.py", line 9, in <module>
    from scrapy.crawler import CrawlerProcess
  File "/Library/Python/2.7/site-packages/scrapy/crawler.py", line 7, in <module>
    from twisted.internet import reactor, defer
  File "/Library/Python/2.7/site-packages/twisted/internet/reactor.py", line 38, in <module>
    from twisted.internet import default
  File "/Library/Python/2.7/site-packages/twisted/internet/default.py", line 56, in <module>
    install = _getInstallFunction(platform)
  File "/Library/Python/2.7/site-packages/twisted/internet/default.py", line 50, in _getInstallFunction
    from twisted.internet.selectreactor import install
  File "/Library/Python/2.7/site-packages/twisted/internet/selectreactor.py", line 18, in <module>
    from twisted.internet import posixbase
  File "/Library/Python/2.7/site-packages/twisted/internet/posixbase.py", line 18, in <module>
    from twisted.internet import error, udp, tcp
  File "/Library/Python/2.7/site-packages/twisted/internet/tcp.py", line 28, in <module>
    from twisted.internet._newtls import (
  File "/Library/Python/2.7/site-packages/twisted/internet/_newtls.py", line 21, in <module>
    from twisted.protocols.tls import TLSMemoryBIOFactory, TLSMemoryBIOProtocol
  File "/Library/Python/2.7/site-packages/twisted/protocols/tls.py", line 63, in <module>
    from twisted.internet._sslverify import _setAcceptableProtocols
  File "/Library/Python/2.7/site-packages/twisted/internet/_sslverify.py", line 38, in <module>
    TLSVersion.TLSv1_1: SSL.OP_NO_TLSv1_1,
AttributeError: 'module' object has no attribute 'OP_NO_TLSv1_1'

Version:

➜  ~ python --version
Python 2.7.10
➜  ~ pip list | grep Scrapy
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
Scrapy (1.2.1)

Any help would be appreciated.

@noprom Try doing these:

pip install --upgrade scrapy
pip install --upgrade twisted
pip install --upgrade pyopenssl

@wzpan
But another problem occurs:

➜  OS scrapy shell 'http://jbk.39.net/bw_t1/'
2017-03-02 20:31:05 [scrapy.utils.log] INFO: Scrapy 1.3.2 started (bot: scrapybot)
2017-03-02 20:31:05 [scrapy.utils.log] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0, 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter'}
2017-03-02 20:31:05 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.corestats.CoreStats']
2017-03-02 20:31:05 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-03-02 20:31:05 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-03-02 20:31:05 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2017-03-02 20:31:05 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-03-02 20:31:05 [scrapy.core.engine] INFO: Spider opened
2017-03-02 20:31:05 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://jbk.39.net/bw_t1/> (failed 1 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionDone: Connection was closed cleanly.>]
2017-03-02 20:31:05 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://jbk.39.net/bw_t1/> (failed 2 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionDone: Connection was closed cleanly.>]
2017-03-02 20:31:05 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET http://jbk.39.net/bw_t1/> (failed 3 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionDone: Connection was closed cleanly.>]
Traceback (most recent call last):
  File "/usr/local/bin/scrapy", line 11, in <module>
    sys.exit(execute())
  File "/Library/Python/2.7/site-packages/scrapy/cmdline.py", line 142, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "/Library/Python/2.7/site-packages/scrapy/cmdline.py", line 88, in _run_print_help
    func(*a, **kw)
  File "/Library/Python/2.7/site-packages/scrapy/cmdline.py", line 149, in _run_command
    cmd.run(args, opts)
  File "/Library/Python/2.7/site-packages/scrapy/commands/shell.py", line 73, in run
    shell.start(url=url, redirect=not opts.no_redirect)
  File "/Library/Python/2.7/site-packages/scrapy/shell.py", line 48, in start
    self.fetch(url, spider, redirect=redirect)
  File "/Library/Python/2.7/site-packages/scrapy/shell.py", line 115, in fetch
    reactor, self._schedule, request, spider)
  File "/Library/Python/2.7/site-packages/twisted/internet/threads.py", line 122, in blockingCallFromThread
    result.raiseException()
  File "<string>", line 2, in raiseException
twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure twisted.internet.error.ConnectionDone: Connection was closed cleanly.>]

It seems that there's a problem with twisted.

@noprom The site does not complete the response when you use the default user agent (or the one you are using).

$ scrapy shell 'http://jbk.39.net/bw_t1/' --set USER_AGENT=Mozilla --loglevel INFO
2017-03-02 09:38:49 [scrapy.utils.log] INFO: Scrapy 1.3.2 started (bot: scrapybot)
2017-03-02 09:38:49 [scrapy.utils.log] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0, 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter', 'USER_AGENT': 'Mozilla', 'LOG_LEVEL': 'INFO'}
2017-03-02 09:38:49 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.corestats.CoreStats']
2017-03-02 09:38:49 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-03-02 09:38:49 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-03-02 09:38:49 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2017-03-02 09:38:49 [scrapy.core.engine] INFO: Spider opened
2017-03-02 09:38:50 [traitlets] WARNING: Config option `pager` not recognized by `InteractiveShellEmbed`.
[s] Available Scrapy objects:
[s]   scrapy     scrapy module (contains scrapy.Request, scrapy.Selector, etc)
[s]   crawler    <scrapy.crawler.Crawler object at 0x109100d68>
[s]   item       {}
[s]   request    <GET http://jbk.39.net/bw_t1/>
[s]   response   <200 http://jbk.39.net/bw_t1/>
[s]   settings   <scrapy.settings.Settings object at 0x109100eb8>
[s]   spider     <DefaultSpider 'default' at 0x10bf23dd8>
[s] Useful shortcuts:
[s]   fetch(url[, redirect=True]) Fetch URL and update local objects (by default, redirects are followed)
[s]   fetch(req)                  Fetch a scrapy.Request and update local objects
[s]   shelp()           Shell help (print this help)
[s]   view(response)    View response in a browser
In [1]: response.body[:100]
b'\r\n<!doctype html>\r\n<html>\r\n<head>\r\n    <meta http-equiv="Content-Type" content="text/html; charset=g'

@rolando
Cool! Thanks a lot.😄

@wzpan
Thanks, you solved my problem.

pip install Twisted==16.4.1

also solved mine, but tbh backwards incompatibility is a shame
twisted guys should really get this fixed

I couldn't even run scrapy by itself with out the SSL error until I downgraded Twisted from 17 to 16.4.1 per @rapliandras

For the record, we've released "packaging fix" versions that prevent Twisted>=17 getting installed, because branches 1.0.x, 1.1.x and 1.2.x only support Twisted<=16.6

  • v1.0.7
  • v1.1.4
  • v1.2.3

Master branch (and the recent v1.3.3) are compatible with Twisted 17+

So it seems that latest Twisted does require pyOpenSSL>=0.16, but provided you add the [tls] extra, as-in pip install twisted[tls].
Twisted 15.5 required pyOpenSSL>=0.13, but Twisted 16.6 requires pyOpenSSL>=0.16.
I think Scrapy should add the [tls] extra in its requirements, even if it will show a warning for Twisted<15 (the extra did not exist then). It should not prevent Scrapy from getting installed.

@redapple I haven't realized it is just a warning, not an error. If adding [tls] still allows to install Twisted then +1 to add it.

It seems that pip < 6.1.0 raises an error if extra requirement is unknown intead of showing a warning - see pypa/pip#2142. I'm not sure what happens if Twisted < 15.0 is already installed, user has pip < 6.1.0 (e.g. pip 1.5 is still popular), and runs pip install scrapy - does it work?

Good point @kmike . It does not work if one asks for Twisted<15:

$ pip install --upgrade 'pip<6.1.0'
$ pip install 'twisted<15'
$ pip install --upgrade 'twisted[tls]<15'
Successfully installed twisted-14.0.2
$ pip install --upgrade 'twisted[tls]<15'
You are using pip version 6.0.8, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Requirement already up-to-date: twisted[tls]<15 in /home/paul/.virtualenvs/piptests/lib/python2.7/site-packages
  Exception:
  Traceback (most recent call last):
    File "/home/paul/.virtualenvs/piptests/local/lib/python2.7/site-packages/pip/basecommand.py", line 232, in main
      status = self.run(options, args)
    File "/home/paul/.virtualenvs/piptests/local/lib/python2.7/site-packages/pip/commands/install.py", line 339, in run
      requirement_set.prepare_files(finder)
    File "/home/paul/.virtualenvs/piptests/local/lib/python2.7/site-packages/pip/req/req_set.py", line 436, in prepare_files
      req_to_install.extras):
    File "/home/paul/.virtualenvs/piptests/local/lib/python2.7/site-packages/pip/_vendor/pkg_resources/__init__.py", line 2504, in requires
      "%s has no such extra feature %r" % (self, ext)
  UnknownExtra: Twisted 14.0.2 has no such extra feature 'tls'

If we consider upgrades to latest Twisted, it works though, because latest Twisted has the extra:

$ pip install --upgrade 'twisted[tls]'
You are using pip version 6.0.8, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Collecting twisted[tls] from https://pypi.python.org/packages/d2/5d/ed5071740be94da625535f4333793d6fd238f9012f0fee189d0c5d00bd74/Twisted-17.1.0.tar.bz2#md5=5b4b9ea5a480bec9c1449ffb57b2052a
  Using cached Twisted-17.1.0.tar.bz2
    Installed /tmp/pip-build-RuAoHT/twisted/.eggs/incremental-16.10.1-py2.7.egg
Requirement already up-to-date: zope.interface>=3.6.0 in /home/paul/.virtualenvs/piptests/lib/python2.7/site-packages (from twisted[tls])
Collecting constantly>=15.1 (from twisted[tls])
  Using cached constantly-15.1.0-py2.py3-none-any.whl
Collecting incremental>=16.10.1 (from twisted[tls])
  Using cached incremental-16.10.1-py2.py3-none-any.whl
Collecting Automat>=0.3.0 (from twisted[tls])
  Using cached Automat-0.5.0-py2.py3-none-any.whl
Collecting pyopenssl>=16.0.0 (from twisted[tls])
  Using cached pyOpenSSL-16.2.0-py2.py3-none-any.whl
Collecting service-identity (from twisted[tls])
  Using cached service_identity-16.0.0-py2.py3-none-any.whl
Collecting idna>=0.6 (from twisted[tls])
  Using cached idna-2.5-py2.py3-none-any.whl
Requirement already up-to-date: setuptools in /home/paul/.virtualenvs/piptests/lib/python2.7/site-packages (from zope.interface>=3.6.0->twisted[tls])
Requirement already up-to-date: six in /home/paul/.virtualenvs/piptests/lib/python2.7/site-packages (from Automat>=0.3.0->twisted[tls])
Collecting attrs (from Automat>=0.3.0->twisted[tls])
  Using cached attrs-16.3.0-py2.py3-none-any.whl
Collecting cryptography>=1.3.4 (from pyopenssl>=16.0.0->twisted[tls])
  Using cached cryptography-1.8.1.tar.gz
Collecting pyasn1-modules (from service-identity->twisted[tls])
  Using cached pyasn1_modules-0.0.8-py2.py3-none-any.whl
Collecting pyasn1 (from service-identity->twisted[tls])
  Using cached pyasn1-0.2.3-py2.py3-none-any.whl
Requirement already up-to-date: packaging>=16.8 in /home/paul/.virtualenvs/piptests/lib/python2.7/site-packages (from setuptools->zope.interface>=3.6.0->twisted[tls])
Requirement already up-to-date: appdirs>=1.4.0 in /home/paul/.virtualenvs/piptests/lib/python2.7/site-packages (from setuptools->zope.interface>=3.6.0->twisted[tls])
Collecting asn1crypto>=0.21.0 (from cryptography>=1.3.4->pyopenssl>=16.0.0->twisted[tls])
  Using cached asn1crypto-0.22.0-py2.py3-none-any.whl
Collecting enum34 (from cryptography>=1.3.4->pyopenssl>=16.0.0->twisted[tls])
  Using cached enum34-1.1.6-py2-none-any.whl
Collecting ipaddress (from cryptography>=1.3.4->pyopenssl>=16.0.0->twisted[tls])
  Using cached ipaddress-1.0.18-py2-none-any.whl
Collecting cffi>=1.4.1 (from cryptography>=1.3.4->pyopenssl>=16.0.0->twisted[tls])
  Downloading cffi-1.10.0.tar.gz (418kB)
    100% |################################| 421kB 437kB/s 
Requirement already up-to-date: pyparsing in /home/paul/.virtualenvs/piptests/lib/python2.7/site-packages (from packaging>=16.8->setuptools->zope.interface>=3.6.0->twisted[tls])
Collecting pycparser (from cffi>=1.4.1->cryptography>=1.3.4->pyopenssl>=16.0.0->twisted[tls])
  Using cached pycparser-2.17.tar.gz
Installing collected packages: pycparser, cffi, ipaddress, enum34, asn1crypto, pyasn1, pyasn1-modules, cryptography, attrs, idna, service-identity, pyopenssl, Automat, incremental, constantly, twisted
(...)
Successfully installed Automat-0.5.0 asn1crypto-0.22.0 attrs-16.3.0 cffi-1.10.0 constantly-15.1.0 cryptography-1.8.1 enum34-1.1.6 idna-2.5 incremental-16.10.1 ipaddress-1.0.18 pyasn1-0.2.3 pyasn1-modules-0.0.8 pycparser-2.17 pyopenssl-16.2.0 service-identity-16.0.0 twisted-17.1.0

Is it fair to say that installing and upgrading via pip with twisted[tls] in dependencies would work in this case? (assuming Twisted>=15 is available from the package index being used)
I may be missing something.

I was asking about a different case:

  1. User already has Twisted < 15 installed (e.g. from system packages), but doesn't have Scrapy installed.
  2. Then user runs pip install scrapy, without --upgrade or specifying a version.

It seems it can fail (I've execute this in a clean virtualenv):

> pip install 'pip < 6.1.0'
..snip..
> pip install 'twisted<15'
..snip..
> pip install twisted[tls]
You are using pip version 6.0.8, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Requirement already satisfied (use --upgrade to upgrade): twisted[tls] in /Users/kmike/envs/tst-scrapy/lib/python2.7/site-packages
  Exception:
  Traceback (most recent call last):
    File "/Users/kmike/envs/tst-scrapy/lib/python2.7/site-packages/pip/basecommand.py", line 232, in main
      status = self.run(options, args)
    File "/Users/kmike/envs/tst-scrapy/lib/python2.7/site-packages/pip/commands/install.py", line 339, in run
      requirement_set.prepare_files(finder)
    File "/Users/kmike/envs/tst-scrapy/lib/python2.7/site-packages/pip/req/req_set.py", line 436, in prepare_files
      req_to_install.extras):
    File "/Users/kmike/envs/tst-scrapy/lib/python2.7/site-packages/pip/_vendor/pkg_resources/__init__.py", line 2504, in requires
      "%s has no such extra feature %r" % (self, ext)
  UnknownExtra: Twisted 14.0.2 has no such extra feature 'tls'

For the record, both debian jessie and ubuntu 14.04 use pip 1.5 and twisted < 15.0, so these baslines are affected.

Suggesting pip install -U scrapy is ok, but not always - this will upgrade requirements like pyOpenSSL or cryptography or lxml, and installation could fail (compiling may require too much RAM, or build dependencies may be absent). It may also fail after installation, at runtime - I recall upgrading scrapy this way on Ubuntu 14.04 without using virtualenv (with pip3 install --user); installation was successful, but then cryptography failed to load, seemingly because pyOpenSSL was not able to use OpenSSL version installed on Ubuntu 14.04.

What do you think about providing scrapy[tls] extra? After bumping requirements to Twisted[tls] >= 15.0 we can make it no-op, and before that users can run pip install scrapy[tls]. I'm not sure it is possible to have the same package both in install_requires and in extra_requires, but with a different version and extras (twisted) - it needs to be checked.

I am not very fond of introducing a "tls" extra at Scrapy level as well, as I think it could be hard to explain that it does not mean TLS support ON or OFF, when to use it etc. It's just a shame we cannot says something like twisted<15,twisted[tls]>=15 in dependencies.

Fair enough. I'm fine with documenting this in FAQ, or maybe in a new Troubleshooting section in Install docs ("got AttributeError: 'module' object has no attribute 'OP_NO_TLSv1_1' exception? This happens because Twisted dropped support for older pyOpenSSL versions. Either downgrade Twisted to ... or upgrade PyOpenSSL to 0.16+).

Have tried to deal with a problem Just follow up step working fine
pip install -U pip
pip install --upgrade scrapy
pip install --upgrade twisted
pip install --upgrade pyopenssl

+1 for a new Troubleshooting section in Install docs. Could be hard to keep updated, but I believe we have some common cases in StackOverflow and here

Rhel 7/centos 7 works for me

pip install Twisted==16.4.1
Uninstalling Twisted-17.1.0:
Successfully uninstalled Twisted-17.1.0

@IAlwaysBeCoding you are a programming god. I just signed up to github, only to give a thumbs up. Your suggestion worked perfectly.

I'm using
Python 3.6.3 |Anaconda, Inc.|
Scrapy 1.4.0
Twisted 16.4.1 (downgraded from 17.9.0)
OpenSSL 17.4.0

when I run pip install twisted[tls] it shows Requirement already satisfied, but I'm still getting the
AttributeError: 'module' object has no attribute 'OP_NO_TLSv1_1' error when trying to run a spider..Anyone knows what to do?

EDIT: just thought I'd mention that I've also tried putting from OpenSSL import SSL in my main.py file .

commented

@wzpan Cool! you solved my problem, Thanks

commented

In scrapy=1.5.0 still exist this problem, need install Twisted==16.4.1

Hi,
uninstall scrapy and twisted etc from pip2 and install it with pip3.
It works with twisted 18.9, scrapy 1.6 for me with pip3.6 on centos.
give it a try
you maybe need to adjust the path (enironment) from /usr/bin to /usr/local/bin

Hi, I'm trying to run scrapy from a script like this:

import scrapy
from scrapy.crawler import CrawlerProcess

class MySpider(scrapy.Spider):
    name = "basic"
    allowed_domains = ["web"]
    start_urls = ['http://www.example.com']

    def parse(self, response):
        l = ItemLoader(item=PropertiesItem(), response = response)
        l.add_xpath('title', '//h1[1]/text()')

        return l.load_item()
process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(MySpider)
process.start()

However, when I run this script I get the following error:

File "/Library/Python/2.7/site-packages/Twisted-16.7.0rc1-py2.7-macosx-10.11-
intel.egg/twisted/internet/_sslverify.py", line 38, in <module>
TLSVersion.TLSv1_1: SSL.OP_NO_TLSv1_1,
AttributeError: 'module' object has no attribute 'OP_NO_TLSv1_1'

Does anyone know how to fix this? Thanks in advance.

Yes i think you url name create problem as i make some changes and , also uncomment the useragent from seeting.py , its working well

`import scrapy
from scrapy.crawler import CrawlerProcess
from ..items import BasicItem

class MySpider(scrapy.Spider):
name = "basic"
allowed_domains = ["web"]
start_urls = ['https://www.rd.com/funny-stuff/short-jokes/']

def parse(self, response):
    item =  BasicItem()

    title = response.css('.listicle-h2').extract()
    item['title']=title

    yield item`

#4391

@noprom The site does not complete the response when you use the default user agent (or the one you are using).

$ scrapy shell 'http://jbk.39.net/bw_t1/' --set USER_AGENT=Mozilla --loglevel INFO
2017-03-02 09:38:49 [scrapy.utils.log] INFO: Scrapy 1.3.2 started (bot: scrapybot)
2017-03-02 09:38:49 [scrapy.utils.log] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0, 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter', 'USER_AGENT': 'Mozilla', 'LOG_LEVEL': 'INFO'}
2017-03-02 09:38:49 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.corestats.CoreStats']
2017-03-02 09:38:49 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-03-02 09:38:49 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-03-02 09:38:49 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2017-03-02 09:38:49 [scrapy.core.engine] INFO: Spider opened
2017-03-02 09:38:50 [traitlets] WARNING: Config option `pager` not recognized by `InteractiveShellEmbed`.
[s] Available Scrapy objects:
[s]   scrapy     scrapy module (contains scrapy.Request, scrapy.Selector, etc)
[s]   crawler    <scrapy.crawler.Crawler object at 0x109100d68>
[s]   item       {}
[s]   request    <GET http://jbk.39.net/bw_t1/>
[s]   response   <200 http://jbk.39.net/bw_t1/>
[s]   settings   <scrapy.settings.Settings object at 0x109100eb8>
[s]   spider     <DefaultSpider 'default' at 0x10bf23dd8>
[s] Useful shortcuts:
[s]   fetch(url[, redirect=True]) Fetch URL and update local objects (by default, redirects are followed)
[s]   fetch(req)                  Fetch a scrapy.Request and update local objects
[s]   shelp()           Shell help (print this help)
[s]   view(response)    View response in a browser
In [1]: response.body[:100]
b'\r\n<!doctype html>\r\n<html>\r\n<head>\r\n    <meta http-equiv="Content-Type" content="text/html; charset=g'

Then how do I use another user agent? I'm trying to scrape a real estate website where I'm just a guest.
https://www.residentialpeople.com/za/property-for-sale/cape-town/?limit=10&offset=0&latitude=-33.9248685&longitude=18.4240553&radius=53.45541417432696&_location=Cape%20Town,%20South%20Africa&_radius_expansion=0.402

I wonder if it's still needed with modern Scrapy?

We actually covered this in the documentation as part of #3517

But now I wonder if we should remove that from the documentation now, if this were not needed nowadays.