URL Shortener

This web app was written over a few days in a couple of several-hour marathons. This was an academic exercise to demonstrate a few of my technology choices for rapidly prototyping a basic but functioning (and hopefully useful) web app.

Technology Solutions

Django 1.9 on Python 3.5
- SQL DB migrations are platform agnostic for ease of deployment
- all strings are wrapped and gettext-ready for translation
- multiple environment-specific settings files
- some configurations overridable by environment variables
- probably works on Python 2.7; I haven't manually tested that, but I have tox run with envlist = py35, py27 and it looks ok so far!
tox (with virtualenv) runs automated tests
- using pytest instead of the stock Django testing framework
- also reports code coverage
- also syntax/style checking with flake8
Heroku for deployment
- includes Heroku-friendly requirements and settings files
- just needs a few environment variables to be set in the Heroku app
New Relic for performance monitoring
Bootstrap for the basic look and feel
Font Awesome for the little GitHub icon in the navbar
GitHub for SCM hosting, of course :)

Also, Twelve-Factor is a great idea. I might not hit every single tenent precisely, but I certainly strive to get them in spirit!

Demo

Check it out in action here:

http://1lnk.xyz

This demo site is configured to use emoji as the URL characters, showing off that you can easily change how short links are generated by setting a single environment variable for the app (ALLOWED_DIGITS). I figured most humans aren't actually typing our their shortened URLs; they're just copying and pasting them around. So, why not make something a bit more fun-looking?

Yes, emoji are only good for folks on modern browsers and devices. So, the default behavior in the configuration is to use the more traditional ASCII alpha-numeric set of characters (0-9, a-z, A-Z).

Development Setup

I strongly encourage you to use virtualenv if you aren't already. Bonus points if you're also using virtualenvwrapper. This makes dependency management super easy and cleanly isolated from any other Python projects you're working on. Once your environment is activated, starting a local instance of this shortener web app should be as simple as:

pip install -r requirements/local.txt
python manage.py migrate
python manage.py collectstatic
python manage.py runserver

This will serve on the default 127.0.0.1:8000.

To run the unit tests, use tox:

tox

If you are actively making changes to the app that modify dependencies, you may need to occasionally use the -r flag to tell tox to recreate its environment.

tox -r

Environment Variables

Several environment variables are read at startup and used in the app's configuration. These include:

ALLOWED_DIGITS: list of characters allowed to make the short URL tokens
DJANGO_SECRET_KEY: the Django secret key; don't commit production keys!
DATABASE_URL: SQL database URL/connection string (e.g. postgres://user:pass@host:5432/name)
DJANGO_ALLOWED_HOSTS: comma-separated list of hostnames the server is will be allowed to respond to

If using a Heroku deployment with New Relic, these should be automatically configured when you set up the New Relic addon:

NEW_RELIC_APP_NAME
NEW_RELIC_LICENSE_KEY
NEW_RELIC_LOG

Discussion Topics

Scalability

The app's current design is relatively simple and should scale well both in terms of number of recorded entries and in number of concurrent requests, as long as it is sitting on reasonable hardware. It uses a SQL database (vendor unspecified, but PostgreSQL is great!) with a very simple schema.

Disclaimer: The current running demo resides a free Heroku instance, and its performance can be pretty pitiful. Dynamos quickly go to sleep when idle, and new requests take a few seconds to wake up. Don't judge it too harshly!

Because the app is so simple right now, the possible scaling concerns would be performance-related when looking at a very large set of data or very large number of concurrent users, both topics of which are discussed in the later Performance section.

Availability

Heroku has a pretty reliable uptime, but regardless of where you deploy, apps like this need integrations with external monitoring services to provide data to you (and to the general public would be nice).

The current implementation has integration with New Relic APM. Here's a screenshot of how New Relic looked when I was doing some minor load-testing of this app on the free Heroku instance:

There are lots of other options to consider for monitoring site availability and performance. I would have liked to try to integrate with a few more of these, but I stopped with the easy New Relic integration given the time constraints. Here's a brief list of options to consider:

Deployment automation

Like site availability and performance monitoring, deployment automation is one area that I didn't allow much time for investigation, but I got a basic implementation working. I went with a simple Heroku app because I'd used it before and knew it would be super easy to get running quickly, and deployments are (generally) a breeze. In this case, I simply commit changes and push, and Heroku's toolbelt takes care of the rest:

git commit -m 'implemented mind-blowing new feature'
git push heroku master
heroku logs --tail

Given more time, other tools I would have liked to play with include:

I also would have liked to investigate best practices for Amazon AWS, as that's a world of deployment and configuration management I don't know much about.

Security

Since this URL shortener app is pretty small, there aren't many special vectors of attack to consider for security. There is no customer data, nothing financially relevant, and no user accounts. That said, there are some "private" data configured like database credentials, and because this is a Django app, it includes the standard Django admin interface that potentially gives a user full read/write control over the data model. SQL injection is very unlikely because no raw SQL statements are built or executed.

In general, I'm relying on and trusting in the stability and security around Django, Heroku, and the open source stack to do a fair job. Anyone managing or deploying this software would just need to follow common sense practices like running software on the OS with appropriate users and permission modes, applying regular security patch updates, not checking in credentials to public repos, firewalling and configuring appropriate network topology, etc.

There are heavy-duty intrusion-detection products that could be installed with networking equipment that could be leveraged if security is a serious concern. For example, I know F5's BIG-IP has an Application Security Module that does deep-packet inspection looking for suspicious activity around HTTP requests.

Coding style

I try to march to the pep8 guidelines and best practices. I have my editors configured to bark at me when I deviate, and if I continue to ignore those notifications, the next time I run tox, it would scream at me with blocking red error text if the flake8 check fails.

Testing

I'm using tox to run just a few basic pytest unit tests. Given more time, I would like to have given complete coverage (or near-complete) with unit tests and written a few Selenium tests to ensure appropriate page layout and interaction.

Also, even though it's probably not applicable for this paritcular project, I also like writing doctests for relatively small and isolated functions.

I would also have liked to get this project running in a continuous integration tool like Jenkins or Travis so I could see the tests running automatically/periodically after commits.

Internal doc

Like with the testing, internal docs are pretty sparse right now due to lack of time. Normally, I would like to fill out docstrings (again, sometimes with doctests embedded) for at least all of the "public" methods and functions.

User doc

You're looking at them! READMEs aren't great, but they're better than nothing. A web app like this is so incredibly simple, I wouldn't imagine putting much effort into documenting its functionality for users. If this were to be expanded with an API or other robust functionality, then I would see generating user docs to be a worthwhile exercise.

In the case of APIs, I would start with autogenerated docs using a framework like Swagger. If I were to build a RESTful API, I would probably start with Django Rest Framework and I know there's a working integration between DRF and Swagger. Beyond that, I'd probably use some kind of lightweight language like Markdown or RST because they're easy to write and they have tools for producing great formatted output.

Performance

Being a Django app, the obvious simple way to improve performance is to add more workers. If running on Heroku, that could mean more dynos. If running on another WSGI server under Apache or Nginx, that could mean configuring to spin up more threads or processes. That could address issues with performance in the web app itself.

If performance analysis suggests the bottleneck is in the database, one could upgrade to more dedicated or performant software/hardware solutions. You could build out DB replication with something like a master/slave setup. If upgrading the DB infrastructure isn't appealing, then we could reconsider the behavior of the app itself and how it queries the database.

The current design tries to be user-friendly by checking the database for an entry first before saving a new one. When checking for an existing entry, it also looks for the earliest possible entry (in case one managed to be duplicated) in order to return the shortest possible URL. If we need to optimize for faster writes (i.e. short link creations), one could argue to change that behavior to either find any entry instead of the first or not to look for existing entries at all. The tradeoff here is that the table will grow faster and possibly hamper read performance in the long term.

The Django DB migrations don't have any special indexes specified. If DB reads are slow, adding indexes are easy, but the tradeoff is that they may make writes slower when the table grows to very large sizes.

The current implementation does not try to do anything in the way of caching. There are several places caching could be implemented if reads need to perform better. Functions could be decorated with @memoize. Django's cache framework could be enabled for caching whole page responses. Outside of the app itself, a caching proxy like Varnish could run in front and handle simple GETs.

infinitewarp / shortener