django-oscar / django-oscar-api

RESTful JSON API for django-oscar

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LazyRemoteFile sometimes raises 403 forbidden error because of urlretrieve headers

joeyjurjens opened this issue · comments

commented

First of all; It does not raise a 403 all the time, but lately I've stumbled upon it quite a few times.

The LazyRemoteFile uses urlretrieve to download images from a given url and saves it to a file.

local_filename, _ = urlretrieve(self.url, self.name)

However, the user-agent it uses by default seems to get blocked by quite a few websites.
Unfortunately, urlretrieve doesn't allow us setting requests headers.

If we want to pass headers with urllib, we could do so as following:

import urllib.request
req = urllib.request.Request('http://www.example.com/')
req.add_header('User-Agent', 'Mozilla/5.0')
r = urllib.request.urlopen(req)
# Now we need to read the response content and save to a file

We could also use the requests library which would look a bit cleaner (in my opinion):

import requests
r = requests.get(self.url, headers={'User-Agent', 'Mozilla/5.0'})
# Now we need to read the response content and save to a file

Is this something I can make a PR for, and if so what method would be preferred?

Please use just urllib, we don't have a lot of requests we are doing and keeping the dependencies minimal is a goal of this project. Please make sure the User-Agent has a sane default, but can be overridden by a setting. Provide some example settings to emulate common browsers in the documentation.

☝️

commented

Fixed in #288