mischasan / aho-corasick

A-C implementation in "C". Tight-packed (interleaved) state-transition matrix -- as fast as it gets, as small as it gets.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

test not pass

changnet opened this issue · comments

A few issue about unit test,nothing serious.

env

  • Debian GNU/Linux 9.9 (stretch)
  • gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)

command

  • make all
  • make test

compile warning

acism_create.c: In function ‘fill_symv’:
acism_create.c:161:26: warning: left shift of negative value [-Wshift-negative-valu ]
     psp->sym_mask = ~(-1 << psp->sym_bits);
                          ^~

acism_dump.c: In function ‘putb’:
acism_dump.c:33:40: warning: '0' flag ignored with precision and ‘%X’ gnu_printf format [-Wformat=]
     fprintf(out, isprint(ch) ? "'%c" : "%02.2X", ch);
                                        ^~~~~~~~
acism_dump.c: In function ‘acism_dump’:
acism_dump.c:78:24: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
          for (i = 1; i < psp->nsyms; i++) {
                        ^
acism_dump.c:58:27: warning: unused variable ‘sym’ [-Wunused-variable]
     int         i, empty, sym, symdist[257] = { 0 };
                           ^~~

acism_x test fail,

Log at acism_t.fail

      31412 hash collisions
     193928 hash displacements
# state machine saved as acism.tmp
./acism_t: line 7:  4136 Segmentation fault      $acism/acism_x $acism/words 11550973

take a look at the source code:acisc_x.c:main

// the variable details already declare in global
int details = expected < 0;
if (details) expected = -expected;

// in the shell script acism_t,only 2 arguments pass to acism_x.
// so textfp should open argv[1]
FILE*	textfp = FOPEN(argv[2], "r");		// REUSE PATTERN FILE AS A TARGET

// check textfp insted of fp
if (!fp) die("cannot open %s", argv[2]);
static char buf[1024*1024];
MEMREF		text = {buf, 0};
int			state = 0;
double		elapsed = 0, start = tick();
while (0 < (text.len = fread(buf, sizeof*buf, sizeof buf, textfp))) {
    t = tick();
    (void)acism_more(psp, text, (ACISM_ACTION*)on_match, pattv, &state);
    elapsed += tick() - t;
    putc('.', stderr);
}

acism_mmap_x test fail

At acism_file.c

ACISM*
acism_mmap(FILE *fp)
{
    ACISM *mp = mmap(0, lseek(fileno(fp), 0L, 2), PROT_READ,
                    MAP_SHARED|MAP_NOCORE, fileno(fp), 0);
    // ......
}

void
acism_destroy(ACISM *psp)
{
    if (!psp) return;
#ifdef XXX
    if (psp->flags & IS_MMAP)
        munmap((char*)psp->tranv - sizeof(ACISM), sizeof(ACISM) + p_size(psp));
    else
#endif//XXX
        free(psp->tranv);
    free(psp);
}

The psp always allocate memory using mmap,and i do not find define XXX,
free(psp->tranv) is illegal,acism_mmap_x test will fail.shoud i add -DXXX=1
to compile option ?

words count not match

In acism_t,11550973 is given,acism_x got 11546633 match,don't know which one is
right.I truncate words to 20 words,run acism_x test and manually analyse,both
match.

I already make some fix to my fork:https://github.com/changnet/aho-corasick, but
not sure I do it right.And i add casesensitive to acism_more,which will slow the
performance a little bit,not sure it should be merged.