seomoz / reppy

Modern robots.txt Parser for Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

reppy's cache doesn't handle redirects gracefully

b4hand opened this issue · comments

If you check cache.allowed() for a URL who's robots.txt gets redirected via a 301 to a different FQDN, the cache entry will be stored for the pre-redirected URL. This makes sense unless you later check cache.allowed() for a URL in the redirected domain which causes an additional robots.txt fetch for the redirected domain. Ideally, it would be nice if the cache entry was updated for both the pre-redirected URL and the post-redirected URL to avoid additional fetches later.