glouw / ctl

The C Template Library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Hashed Containers (or a Better Explanation why not)

polijan opened this issue · comments

The README states that CTL will not have unordered_map and unordered_set. The rationale for this is quite succinct: "ordered containers are preferred, even at the cost of performance".

If CTL won't offer hashed containers, could it at least extend the explanation why red/black containers should be always preferred. It's not obvious why it would always be true. Especially, as:

  1. hash tables are a popular datastructure in C
  2. it's not just about popularity, hash containers retrieval is algorithmically faster (O(1) average) which really can matter.

Yes, I got quite a bit of interest on hackernews for unordered sets, so I started my implementation of std::unordered_set on the staging branch (see ust.h): https://github.com/glouw/ctl/blob/staging/ctl/ust.h

It should be ready in a couple weeks

But very buggy still. And chained, not open addressing. You shalt not make the same mistake as the STL

Rehash, each are broken, insert needs to check the load_factor

@rurban, considering it's the staging branch, I wouldn't consider it production worthy. As for linked list chained hash tables, the main benefit is that pointers to unordered set elements remain valid for the lifetime of the unordered set, which, in practically, is more important than speed. I wouldn't consider this a mistake in the STL.

I've back ported std::unordered_set, named ust.h. It's 2.5x faster than set.h. Here are it's performance characteristics compare to set.h:

image

image

Example hashing of strings (with a relatively uninteresting hash function):

#include <stdio.h>
#include <str.h>

#define T str
#include <ust.h>

size_t
str_hash(str* s)
{
    size_t hash = 5381;
    int c;
    char* temp = s->value;
    while(c = *temp++)
        hash = ((hash << 5) + hash) + c;
    return hash;
}

int
str_equal(str* a, str* b)
{
    return str_key_compare(a, b) == 0;
}

int
main(void)
{
    ust_str strs = ust_str_init(str_hash, str_equal);
    ust_str_insert(&strs, str_init("this"));
    ust_str_insert(&strs, str_init("is"));
    ust_str_insert(&strs, str_init("a"));
    ust_str_insert(&strs, str_init("test"));
    ust_str_insert(&strs, str_init("for"));
    ust_str_insert(&strs, str_init("unordered"));
    ust_str_insert(&strs, str_init("set"));
    ust_str_insert(&strs, str_init("for"));
    ust_str_insert(&strs, str_init("the"));
    ust_str_insert(&strs, str_init("C"));
    ust_str_insert(&strs, str_init("Standard"));
    ust_str_insert(&strs, str_init("Template"));
    ust_str_insert(&strs, str_init("Library"));
    ust_str_insert(&strs, str_init("(CTL)"));
    ust_str_insert(&strs, str_init("..."));
    foreach(ust_str, &strs, it) 
        puts(str_c_str(it.ref));
    for(size_t i = 0; i < strs.bucket_count; i++)
        printf("%2d : %2lu\n", i, ust_str_bucket_size(&strs, i));
    str what = str_init("(CTL)");
    str* found = &ust_str_find(&strs, what)->key;
    puts(str_c_str(found));
    ust_str_erase(&strs, what); // For fun.
    str_free(&what);
    ust_str_free(&strs);
}
gcc test.c -I ctl
this
Template
the
(CTL)
is
unordered
Standard
Library
C
a
...
set
for
test
 0 :  0
 1 :  0
 2 :  1
 3 :  0
 4 :  2
 5 :  1
 6 :  0
 7 :  0
 8 :  1
 9 :  0
10 :  1
11 :  0
12 :  0
13 :  1
14 :  1
15 :  1
16 :  1
17 :  0
18 :  1
19 :  0
20 :  1
21 :  0
22 :  0
23 :  0
24 :  2
25 :  0
26 :  0
27 :  0
28 :  0
(CTL)