fredZen / random-py

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The assignment

Build a clone of $ cat /dev/random in the scripting language of your choice, generating your own entropy. Do not take more than three hours.

Analysis and assumptions

At first sight, the assignment is pretty straightforward: /dev/random is a source of random bytes. cat dumps them into the standard output.

While modern computers can have hardware random number generators, they are still mostly deterministic machines. Therefore, most so-called random numbers generated by a computer are actually the output of an algorithm called a pseudo random number generator (PRNG).

A typical algorithms text book contains one or several such PRNGs. Seminumerical algorithms, tome 2 of Donald Knuth’s magnum opus The art of computer programming, has a whole chapter dedicated to them. However, the PRNGs Knuth writes about are mostly concerned with producing a convincingly random statistical distribution.

On typical modern Unix systems (Linux, FreeBSD, OpenBSD, macOS, probably others), /dev/random is a bit more than that. It is intended to be a cryptographically secure random number generator. In addition to looking statistically random, they try to reach additional goals of making an attacker’s life as difficult as possible by:

  • being hard to predict, even if the attacker has some control over or insider knowledge of the state of the machine generating the random numbers
  • making it hard or impossible to deduce past states or past random numbers from a breach of information about the current state of the random number generator

They achieve these goals by collecting as much real entropy from the world as possible - typically by measuring minute differences in timing between events such as keystrokes, mouse movements, disk access times etc, but other physical measurements could make sense too. These sources of entropy are then mixed together and used to feed a cryptographically strong random number generator - that is, a function that has been specifically designed so that, even if some of its sources of randomness are compromised, it keeps generating high quality randomness, or at least gets back there as quickly as possible. While not necessarily the case, such a function can be built from typical cryptography components such as block ciphers and secure hash functions.

To be credible, such a random number generator needs to have gone through some serious peer review and cryptanalysis. Therefore, I chose to base my random number generator on fortuna, a cryptographically secure random number generator described by Niels Ferguson, Bruce Schneier and Tadayoshi Kohno in Cryptograhy engineering. (See also https://www.schneier.com/academic/fortuna/)

I have tried to replicate Fortuna in Python. There are a few caveats:

  • Collecting physical entropy is typically done by device drivers. This would be hard to replicate with a scripting language in userland – it would still be possible to ask the user to press keys and measure the time between the keypresses, but it is not clear how much timing resolution would be left in userland. It would also eat significantly into my time limit for solving the exercise. Therefore I chose not to attempt it.
  • Fortuna is pretty dedicated to erasing its past states, to eliminate the risk of leaking previously generated random numbers. While this is moderately difficult in a low-level language such as C/C++ or Rust, it is much more difficult on managed languages. I do not have an intimate enough knowledge of the Python VM to make any claims about successfully erasing any remnants of past states.

Fortuna is made of 3 components:

  • The pseudo random number generator per se
  • An entropy pool, responsible for pooling entropy from various sources and feeding it to the PRNG
  • A persistent seed file, to help the PRNG have sufficient randomness at boot time

I implemented the PRNG in as the Generator class. This generator can be used standalone, by seeding it with a value. The system clock, while quite weak, can be such a seed.

Furthermore, I also implemented an entropy pool.

The pool does not add any real value here, since I only have one entropy source, the system time (and a poor one it is). Unlike the real Fortuna, there is no replenishment of my entropy pool happening while the script runs: what little entropy I do get from there is injected into the pool as the program starts.

Still, I found the component interesting so I left it in.

Setup

My solution is based on Python 3, and has a depency on the cryptography library for its implementation of AES. I recommend using pyenv and virtualenv to set up the environment

cd project_path
pyenv install
virtualenv .env
source .env/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Usage

There are unit tests that can be run with

cd project_path
source .env/bin/activate
python -m unittest

The clone of cat /dev/random can be run with

cd project_path
source .env/bin/activate
python -m exercise

About


Languages

Language:Python 100.0%