geetansh / redirects-generator

A Python script for generating Apache Rewrite rules from a CSV

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Apache Redirect Rules Generator

This project is a Python script for generating Apache rewrite/redirect rules from a CSV of URL's.

Compatible Platforms

The script has been tested on Mac OS X and Ubuntu 20.04 and should work ok on those platforms

Limitations

It is nowhere near ready for use in a production/working environment yet and should only be considered a proof-of-concept only at this point!

The script will currently overwrite any file specified as the destination for the redirect/rewrite rules with its own contents every time the script is run so any previous contents in that file will be lost and overwritten.

At present, the redirects generated will rewrite the whole path of the source URL to a destination URL. Nothing more complex is supported at present. Nginx redirects/rewrites are not supported by this tool

What It Can Do

The script will generate either RedirectMatch lines or Rewrite Rules for a given list of redirects supplied in a CSV file. It can automatically escape URL-encoded characters (e.g %20) in source URL paths to be redirected and produce rewrite rules for URL's that have a query string in them ,with either one parameter or multiple parameters

Filename extensions in URL's are also accounted for in some small way.

For source URL paths that include a trailing slash, a rule will be generated that matches on both the version with a trailing slash and a version without one. For example, a source URL with the value https://www.example.com/peter/ would generate a rule that matches on the path /peter/ or /peter

Redirects with different HTTP response codes are catered for (currently only 301, 302 and 307 HTTP response codes are supported for generated redirects) and each individual redirect can have a different response code configured for it.

Getting Data Into It and How it is Processed

The CSV file itself should have the following 3 field headings at the top of the CSV file:

  • source
  • destination
  • redirect_code

Description of each field required and how it is processed can be found below:

  • source: the source url for the redirect (only path, query strings and fragments are preserved). So that, for example, https://www.example.com/peter becomes /peter)

  • destination: the destination url for the redirect (full url is preserved)

  • redirect_code: the response code for the redirect (e.g, 301, 302)

Information about the repository

There are 2 additional sub-directories in the repository apart from the scripts sub-directory where the script itself resides. Their purpose is described below:

  • redirects: This sub-directory can be used for storing the redirects/rewrite rules the script outputs. At present, any file named .htaccess or with a .txt extension is not tracked in the repository

  • urls: This sub-directory can be used for storing the csv files containing the redirects you want to generate and/or test. At present, any file with a .csv or .bak extension is not tracked in the repository. There is an example CSV file in this directory called urls.csv.example that you can use as test input data for the script.

Getting started: Installation

You'll need Python 3 on your local system to use it. Its best to run it in a python virtual environment using the instructions below. You'll need the venv python module installed on your local system to do this.

More information on using virtual environments with Python can be found here

If you are running this on Ubuntu, you may need to install the python3-venv, python3-dev and build-essential packages using the command sudo apt install python3-venv python3-dev build-essential before starting with the steps below. These steps assume you are running this on Ubuntu/Mac OS X but WSL running on Windows should work ok as well.

  • Checkout/clone the git repository to a directory on your local machine
  • cd into the scripts sub-directory of the repository
  • Give the script execute permissions with the command: chmod +x redirectgen.py
  • On your local system, run the command: python3 -m venv env
  • On your local system, run the command: source env/bin/activate
  • run the command: pip install -r requirements.txt to install the dependencies to the virtual environment

Usage

For more information, see the examples further down.

usage: redirectgen.py [-h] -u URLS -o OUTFILE

optional arguments:
  -h, --help            show this help message and exit
  -u URLS, --urls URLS  path to csv file containting the redirect urls we want to process
  -o OUTFILE, --outfile OUTFILE
                        path for the file we want to output the finished redirects to

Example: Using the Script (Just Generating Redirects Using Example Data)

Example of how to do this is below:

  • cd into the top-level directory of the repository
  • Copy the file urls.csv.example in the urls sub-directory to urls.csv
  • Setup the script and Python Virtual Environment using the instructions above.
  • Run the command: ./scripts/redirectgen.py -u urls/urls.csv -o redirects/.htaccesson your local machine

The example above will take in URLs from a CSV file located at urls/urls.csv and output redirects/rewrite rules to the file located at redirects/.htaccess

Using the urls.example.csv in the repository will generate the redirects below in the redirects output file:

RedirectMatch 301 ^/peter\x20lavelle\.pdf\.exe$ https://www.google.com
RedirectMatch 302 ^/peter\x20lavelle\.pdf\.exe\.doc$ https://www.google.com
RedirectMatch 301 ^/peter\x20lavelle\.pdf\.exe\.doc2$ https://www.google.com
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REQUEST_URI} ^/peterlavelle\.jpg$
RewriteCond %{QUERY_STRING} ^myquery$
RewriteRule ^(.*)$ https://www.yahoo.com/thisisatest [R=301,L,QSD]
</IfModule>
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REQUEST_URI} ^/peter\x20lavelle\.jpg$
RewriteCond %{QUERY_STRING} ^myquery$
RewriteRule ^(.*)$ https://www.yahoo.com/thisisatest [R=302,L,QSD]
</IfModule>

Note that, at present, the script will overwrite the redirects file each time it is run, so any contents in the redirects file will be lost and replaced with the newly-generated redirects/rewrites.

Example: Using the Script (Generating and Testing Redirects with Lando Using Example Data)

There is a .lando.yml file in the repository that you can use to bring up an Apache container to serve as a local instance to test the redirects generated by the script. You will need Lando installed on your local machine to make use of this.

You can copy the urls.csv.example file in the urls sub-directory to urls.csv to follow along with this example:

If you are new to Lando, more information on getting started with installing it can be found here

To use the Lando instance to test your redirects, follow the steps below:

  • cd into the top-level directory of the repository on your local machine

  • Start the Python Virtual Environment on your local system for the script using the instructions above.

  • run the command lando startto bring up the lando instance.

  • When lando has finished setting everything up you should get something similar to this in your terminal:

       NAME            redirectstester                
       LOCATION        /home/user/redirects-generator 
       SERVICES        webserver                      
       WEBSERVER URLS  http://localhost:49154         
      			      http://redirects.lndo.site/
    

    The .lando.yml file is configured to bring up the Apache container with the URL: http://redirects.lndo.site We will need this URL for the next step.

  • The lando instance is configured to use the redirects sub-directory of the repository as its Document Root. We will take account of this in the example command in the next step.

  • Run the following command from the top level directory of the repository on your local machine to generate a set of redirects/rewrites and output them to a .htaccess file so they can be picked up by the Lando Docker container: ./scripts/redirectgen.py -u urls/urls.csv -o redirects/.htaccess

  • On Mac OS X or Linux Use the command curl -I -L http://redirects.lndo.site/your/redirect/source/path to test your redirects. Replace /your/redirect/source/path with the source path of the particular redirect you are testing. More information on the curl utility can be found here

About

A Python script for generating Apache Rewrite rules from a CSV


Languages

Language:Python 100.0%