jamesturk / scrapelib

⛏ a library for scraping unreliable pages

Home Page:https://jamesturk.github.io/scrapelib/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

May want to consider requiring requests security extras

gregoryfoster opened this issue · comments

Hello, and thank you for contributing and maintaining such a helpful project.

I found out about scrapelib via the unitedstates/congress project, which I'm just beginning to explore. That project requires scrapelib>=0.1.0,<1.0.0 and my virtual environment picked up scrapelib-0.10.1. The first command I attempted to run was to pull down the fdsys sitemaps, which resulted in the following SSL error emerging out of scrapelib:

File "unitedstates-congress/.env/lib/python2.7/site-packages/scrapelib/__init__.py", line 201, in request
raise exception_raised
SSLError: [SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:590)

I found this StackOverflow post which recommends installing the requests package's extra security packages:

pip install requests[security]

That worked for my immediate use case. Since this wasn't an issue with unitedstates/congress, I thought to recommend that scrapelib update its requests module requirement to include the optional security packages. If my understanding of the issue is correct, this issue could crop up in other downstream projects for similar user configurations (OSX 10.12.3 Sierra w/ stock OpenSSL 0.9.8zh running Python 2.7.10 w/ requests 2.13.0).

(Separately, there seems to be a mismatch between the project README's requirement for requests > 2.0 and the requirements.txt requirement for requests >= 1.2.2)

Thank you again, and I hope this is helpful!

thanks, looking at doing this now 👍

1.0.2 depends upon requests[security], thanks for the idea