rurban / nbperf

Improved NetBSD's Perfect Hash Generation Tool v3

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Segmentation fault on linux with `-pId`

am11 opened this issue Β· comments

First, thanks for this implementation. πŸ‘

I was experimenting with different combinations of nbperf(1) and realized that in order to attach additional data to the keys, we must use embed option. This is a bit different than, e.g. https://en.wikipedia.org/wiki/Cuckoo_hashing, which keeps the original indices preserved (values, not the order) so that additional data can be attached to the same location without creating a separate map. Ultimately, I'm looking for the best balance: (O(1)) lookup time with as minimal size impact as possible for the inputs < maxof uint16_t.

Nonetheless, I wanted to measure the size impact of various algorithm on the final binary, before making a judgement call, so I gave nbperf -pId a try. πŸ™‚

Repro:

$ printf '%s\n' {1..5} > list2
$./nbperf -pId list2
./* generated with rurban/nbperf d878cd8 -Ip list2 */
/* seed[0]: 198677718, seed[1]: 2 */
#include <stdint.h>
#include <string.h>

static inline void _inthash2(const int32_t key, uint32_t *h)
{
	*h = (key * (UINT32_C(0xEB382D69) + UINT32_C(198677718)))
		 + UINT32_C(2);
}

const char * const inthash_keys[5] = {
Segmentation fault (core dumped)

Under the debugger this is what I see:

$ gdb --args ./nbperf -pId list2
...
(gdb) r
Starting program: /home/appveyor/nbperf/nbperf -pId list2
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
./* generated with rurban/nbperf d878cd8 -Ip list2 */
/* seed[0]: 198677718, seed[1]: 2 */
#include <stdint.h>
#include <string.h>

static inline void _inthash2(const int32_t key, uint32_t *h)
{
	*h = (key * (UINT32_C(0xEB382D69) + UINT32_C(198677718)))
		 + UINT32_C(2);
}

const char * const inthash_keys[5] = {

Program received signal SIGSEGV, Segmentation fault.
__strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:74
74	../sysdeps/x86_64/multiarch/strlen-avx2.S: No such file or directory.
(gdb) bt
#0  __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:74
#1  0x00007ffff7c76d31 in __vfprintf_internal (s=0x7ffff7e1b780 <_IO_2_1_stdout_>, format=0x55555555b924 "\"%s\", ", ap=ap@entry=0x7fffffffceb0, mode_flags=mode_flags@entry=2) at ./stdio-common/vfprintf-internal.c:1517
#2  0x00007ffff7d34d13 in ___fprintf_chk (fp=<optimized out>, flag=flag@entry=1, format=format@entry=0x55555555b924 "\"%s\", ") at ./debug/fprintf_chk.c:33
#3  0x000055555555879b in fprintf (__fmt=0x55555555b924 "\"%s\", ", __stream=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/stdio2.h:105
#4  print_hash (state=0x7fffffffcfb0, nbperf=0x7fffffffd0a0) at nbperf-chm.c:177
#5  chm_compute (nbperf=0x7fffffffd0a0) at nbperf-chm.c:413
#6  0x0000555555555a88 in main (argc=<optimized out>, argv=<optimized out>) at nbperf.c:710

and this is from aarch64 linux machine:

const char * const inthash_keys[5] = {

Program received signal SIGSEGV, Segmentation fault.
__strlen_mte () at ../sysdeps/aarch64/multiarch/../strlen.S:60
60	../sysdeps/aarch64/multiarch/../strlen.S: No such file or directory.
(gdb) bt
#0  __strlen_mte () at ../sysdeps/aarch64/multiarch/../strlen.S:60
#1  0x0000fffff7dd1f68 in __vfprintf_internal (s=0xfffff7f0c5d8 <_IO_2_1_stdout_>, format=format@entry=0xaaaaaaaa7470 "\"%s\", ", ap=..., mode_flags=mode_flags@entry=2) at ./stdio-common/vfprintf-internal.c:1517
#2  0x0000fffff7e63f1c in ___fprintf_chk (fp=<optimized out>, flag=flag@entry=1, format=format@entry=0xaaaaaaaa7470 "\"%s\", ") at ./debug/fprintf_chk.c:33
#3  0x0000aaaaaaaa42d0 in fprintf (__fmt=0xaaaaaaaa7470 "\"%s\", ", __stream=<optimized out>) at /usr/include/aarch64-linux-gnu/bits/stdio2.h:105
#4  print_hash (state=0xfffffffff468, nbperf=0xfffffffff550) at nbperf-chm.c:177
#5  chm_compute (nbperf=nbperf@entry=0xfffffffff550) at nbperf-chm.c:413
#6  0x0000aaaaaaaa136c in main (argc=<optimized out>, argv=<optimized out>) at nbperf.c:710

FWIW, I've created a C# class equivalent of nbperf -pI and stripped away all the invariant options: https://gist.github.com/am11/22eaa6584de55483d988d9831899bcd3 (generates ditto output; checked with diff -w). Wanted to attach more data by switching CHM3PerfectHashGenerator(uint[] keys, TextWriter output) with the likes of CHM3PerfectHashGenerator(Dictionary<uint, object> data, TextWriter output).

-d embed_data did not support intkeys yet.