shangsuru / url-shortener

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

URL Shortener Service

On a high level, a URL shortener is able to generate a unique short URL. The server persists the short URLs in a data store and is able to redirect clients to the original long URL against the short URL.

API

POST api/v1/data/shorten?longUrl=xxx: return shortUrl

GET api/v1/shortUrl: return longUrl for HTTP redirection (302 redirect to enable analytics)

URL Generation Method

To create shortened URLS, we will use UUIDs with base62 encoding. This approach is simpler than the approach using hash functions, as we do not have to account for collisions.

Database

We will store short/long URL pairs together with creation timestamp and last_visited timestamp into DynamoDB. A NoSQL datastore is chosen, because it is non-relational data where we do not require joins, so we can benefit from a dynamic schema and easy horizontal scaling. Also we need utilize caching.

Security

We make use of rate limiting, to prevent a malicious user from consuming too much of our system's capacity.

Analytics

HTTP headers of the URL redirection request are used to collect data for analytics. The IP address of the client identifies the location.

The server will put a message on the message queue, while an archive service will execute a batch operation to move messages from the message queue to an analytics database, e.g. Amazon Redshift.

For simplicity, we will collect only the number of clicks over a certain timeframe, and the country of the user.

Database Cleanup

A dedicated cleanup service is executed during non-peak hours to remove expired URLs. Otherwise, the system falls back on a lazy removal approach, where when the client hits an expired URL, it gets deleted.

Monitoring

Monitoring is necessary to identify system failures. We implement it by installing an agent on each of the servers, which collect and aggregate metrics and publish them to a central monitoring service. Dashboards are configured to visualize the data.

We can collect the following metrics:

  • CPU load, memory, disk I/O
  • queries per second, cache hit ratio
  • latency, failure rate

For example, we can use datadog, a popular log aggregation and monitoring service.

About


Languages

Language:Ruby 56.0%Language:TypeScript 38.2%Language:Dockerfile 3.7%Language:CSS 0.8%Language:HTML 0.5%Language:Shell 0.4%Language:JavaScript 0.4%