nfinit / WaybackProxy

HTTP proxy for tunneling requests through the Internet Archive Wayback Machine

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

WaybackProxy

WaybackProxy is a retro-friendly HTTP proxy which retrieves pages from the Internet Archive Wayback Machine or OoCities and delivers them in their original form, without toolbars, scripts and other extraneous content that may confuse retro browsers.

1999 Google viewed on Internet Explorer 4.0 on Windows 95

Setup

Python 3.5 or newer is required.

  1. Edit config.json to your liking
  2. Optionally exclude domains from being proxied by adding them to whitelist.txt
  3. Install dependencies: pip install --user -r requirements.txt
  4. Start waybackproxy.py
  5. Set up your retro browser:
    • If your browser supports proxy auto-configuration, set the auto-configuration URL to http://ip:port/proxy.pac where ip is the IP of the system running WaybackProxy and port is the proxy's port (8888 by default).
    • If proxy auto-configuration is not supported or fails to work, set the browser to use an HTTP proxy at that IP and port instead.
    • Transparent proxying is also supported for advanced users, with no configuration to WaybackProxy itself required.
      • The easiest way to set up a transparent WaybackProxy is to run it on port 80 (this cannot be done on Linux without security implications), set up a fake DNS server - such as dnsmasq -A "/#/ip" where ip is the IP of the system running WaybackProxy - to redirect all requests to the proxy, and point client machines at that DNS server.
  6. Try it out! You can edit most settings that are in config.json by browsing to http://web.archive.org while on the proxy, although you must edit config.json to make them permanent.
  7. Press Ctrl+C to stop the proxy

Docker Container

A Dockerfile is included that allows you to run WaybackProxy from a docker container.

Environment Variables

When deploying via Docker, the config.json can be customized by specifying environment variables when creating the docker container. The environment variables match the example config.json in this repository. Below is a complete list:

Parameter Default Description
LISTEN_PORT 8888 Listen port for the HTTP proxy
DATE 20011025 Date to get pages from Wayback. YYYYMMDD, YYYYMM and YYYY formats are accepted, the more specific the better.
DATE_TOLERANCE 365 Allow the client to load pages and assets up to X days after DATE. Set to None to disable this restriction.
GEOCITIES_FIX True Send Geocities requests to oocities.org if set to True.
QUICK_IMAGES True Use the original Wayback Machine URL as a shortcut when loading images.
WAYBACK_API True Use the Wayback Machine Availability API to find the closest available snapshot to the desired date, instead of directly requesting that date.
CONTENT_TYPE_ENCODING True Allow the Content-Type header to contain an encoding
SILENT True Disables logging to STDOUT if set to True
SETTINGS_PAGE True Enables the settings page on http://web.archive.org if set to True

How to run in Docker

Using Docker Registry

To pull:

docker pull cttynul/waybackproxy:latest

To run:

docker run -d -e DATE=20011025 -p 8888:8888 cttynul/waybackproxy

Build locally

To build:

docker build --no-cache -f Dockerfile -t waybackproxy .

To run:

docker run -d -e DATE=20011025 -p 8888:8888 waybackproxy

Known issues and limitations

  • The Wayback Machine itself is not 100% reliable. Known issues include:
    • Pages newer than the specified date (setting a specific YYYYMMDD date instead of a wider YYYYMM or YYYY helps with that);
    • Random broken images;
    • Strange 404 errors caused by bad server responses or incorrect URL capitalization at archival time;
    • Infinite redirect loops;
    • Server errors when it's having a bad day.
  • WaybackProxy will work around some redirection scripts (example: http://example.com/redirect?to=http://...) which are not archived by the Wayback Machine, but the destination URLs are sometimes not archived either.
  • WaybackProxy is not a generic proxy. The POST and CONNECT methods are not implemented.
  • Transparent proxying mode requires HTTP/1.1 and therefore cannot be used with some really old (pre-1996) browsers. Use standard mode with such browsers.

Other links

About

HTTP proxy for tunneling requests through the Internet Archive Wayback Machine

License:GNU General Public License v3.0


Languages

Language:Python 92.0%Language:Shell 4.0%Language:HTML 2.3%Language:Dockerfile 1.7%