RJ / ketama

C library for consistent hashing, and langauge bindings

Home Page:http://www.metabrew.com/article/libketama-consistent-hashing-algo-memcached-clients

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Maybe an incorrect implementation?

YueHonghui opened this issue · comments

Around the line of latest commit:
https://github.com/RJ/ketama/blob/master/libketama/ketama.c#L428

for( i = 0; i < numservers; i++ )
{
        float pct = (float)slist[i].memory / (float)memory;
        unsigned int ks = floorf( pct * 40.0 * (float)numservers );
#ifdef DEBUG
        int hpct = floorf( pct * 100.0 );

        syslog( LOG_INFO, "Server no. %d: %s (mem: %lu = %u%% or %d of %d)\n",
            i, slist[i].addr, slist[i].memory, hpct, ks, numservers * 40 );
#endif

        for( k = 0; k < ks; k++ )
        {

when servers being added into or removed from pool, the ks value of other untouched servers and the points on the circle will be changed. That means hit ratio will be decreased when the pool changed. Why not implement as:

unsigned int ks = 40.0 *  (int)(slist[i].memory);//memory in MB

Do I misunderstand the ketama algorithm or your implementation?

You are correct, when the memory is not constant across all servers, then adding or removing a server will alter the points of other servers. The original algorithm that Karger et al. (1997) constructed and proved to be monotone doesn't know about weights like this and uses a constant κ·log(C) to replicate the buckets (servers).

Also, your suggestion to only multiple by the weight would fix the problem, but it then should be set to small numbers like 1 to 100, and not something like 131072 that slits[i].memory would be now, because that just artificially inflates the number of points on the circle.