Clang static analyser: "division by zero" warning in HASH_EXPAND_BUCKETS
diabonas opened this issue · comments
Consider this minimal example:
#include "uthash.h"
struct my_struct {
int id;
char name[10];
UT_hash_handle hh;
};
struct my_struct *users = NULL;
int main(int argc, char *argv[]) {
struct my_struct *s;
s = (struct my_struct *)malloc(sizeof *s);
s->id = 1;
HASH_ADD_INT( users, id, s );
}
When running with the Clang 7.0.0 static analyser scan-build
, the following warning is generated:
$ scan-build clang example.c
scan-build: Using '/usr/bin/clang-7' for static analysis
example.c:15:2: warning: Division by zero
HASH_ADD_INT( users, id, s );
^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/uthash.h:383:5: note: expanded from macro 'HASH_ADD_INT'
HASH_ADD(hh,head,intfield,sizeof(int),add)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/uthash.h:318:3: note: expanded from macro 'HASH_ADD'
HASH_ADD_KEYPTR(hh, head, &((add)->fieldname), keylen_in, add)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/uthash.h:311:3: note: expanded from macro 'HASH_ADD_KEYPTR'
HASH_ADD_KEYPTR_BYHASHVALUE(hh, head, keyptr, keylen_in, _ha_hashv, add); \
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/uthash.h:301:3: note: expanded from macro 'HASH_ADD_KEYPTR_BYHASHVALUE'
HASH_ADD_TO_BKT((head)->hh.tbl->buckets[_ha_bkt], &(add)->hh); \
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/uthash.h:745:8: note: expanded from macro 'HASH_ADD_TO_BKT'
HASH_EXPAND_BUCKETS((addhh)->tbl); \
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/uthash.h:815:58: note: expanded from macro 'HASH_EXPAND_BUCKETS'
_he_newbkt->expand_mult = _he_newbkt->count / \
~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.
scan-build: 1 bug found.
Apparently (tbl)->ideal_chain_maxlen
can be zero, leading to undefined behaviour when dividing through it. This seems undesirable, however I do not understand the code well enough to propose a fix. #129 seems to be related and might be an indicator that additional error handling for this case could be required.
As this happens in a macro in the header file uthash.h
, downstream projects might be affected by this issue when they use scan-build
on a project that includes uthash, see e.g. tpm2-software/tpm2-tss#1229 (comment).
I admit that the existence of #129 does scare me a little.
On the other hand, all previous Clang static-analysis issues have turned out to be false positives: #111, #128.
So I'm not a priori inclined to spend a lot of time digging into this one. If you could provide a compilable test case that actually encounters a division by zero, that would be a very different matter, and I'd consider it a high priority to fix!
The other interesting thing about division (besides that it can have UB) is that it's very slow. I would gladly review a patch that replaced this division with an appropriate right-shift or something like that, as long as it preserved the current high-level asymptotic behaviors.
I agree that static analysis is not always reliable, and I don't immediately see a way to exploit the undefined behaviour. Would using an assertion be an option? Adding something like
assert((tbl)->ideal_chain_maxlen > 0);
after the calculation of ideal_chain_maxlen
(and using #include <assert.h>
, of course) signals to the analyser that this code path does not need to be tested, thus silencing the warning. As a bonus, should this case actually occur, the error is a bit clearer. Or is this piece of code so time-critical that no additional error-handling should be done there, unless absolutely necessary?
The code is definitely not time-critical (it happens only during reallocation). However, uthash currently doesn't use any assert
s, so adding one would mean adding #include <assert.h>
and conceivably breaking some codebase where not-using-<assert.h>
was considered a feature. (OTOH, <utlist.h>
does use assert
already. But the userbases of uthash and utlist don't seem to overlap much, from what I've seen.)
Reading the code comment on ideal_chain_maxlen
reassures me that the calculation is (or ought to be) correct:
* The calculation of tbl->ideal_chain_maxlen below deserves some
* explanation. First, keep in mind that we're calculating the ideal
* maximum chain length based on the *new* (doubled) bucket count.
* In fractions this is just n/b (n=number of items,b=new num buckets).
* Since the ideal chain length is an integer, we want to calculate
* ceil(n/b). We don't depend on floating point arithmetic in this
* hash, so to calculate ceil(n/b) with integers we could write
*
* ceil(n/b) = (n/b) + ((n%b)?1:0)
*
* and in fact a previous version of this hash did just that.
* But now we have improved things a bit by recognizing that b is
* always a power of two. We keep its base 2 log handy (call it lb),
* so now we can write this with a bit shift and logical AND:
*
* ceil(n/b) = (n>>lb) + ( (n & (b-1)) ? 1:0)
I wonder, what if we rewrote this as
(tbl)->ideal_chain_maxlen = \
- ((tbl)->num_items >> ((tbl)->log2_num_buckets+1U)) + \
- ((((tbl)->num_items & (((tbl)->num_buckets*2U)-1U)) != 0U) ? 1U : 0U); \
+ ((tbl)->num_items + 2*(tbl)->num_buckets - 1U) \
+ >> ((tbl)->log2_num_buckets + 1U) \
(tbl)->nonideal_items = 0; \
Would that help the static analyzer at all?
Also, I'm pretty sure that when num_buckets > (1u << 31)
(i.e. "way more buckets than we intend to encounter in practice"), we do have ideal_chain_maxlen == 0
(and perhaps undefined behavior on the shift). That's just a fact of math. Does the static analyzer have some way to avoid reporting on that kind of thing? or how do people generally work around that?
@Quuxplusone WRT how do people generally work around that? - simply by adding an explicit check.
Something like the below would do the job:
diff --git a/src/uthash.h b/src/uthash.h
index f34c1f9..27c4ae0 100644
--- a/src/uthash.h
+++ b/src/uthash.h
@@ -937,7 +937,8 @@ do {
_he_newbkt = &(_he_new_buckets[_he_bkt]); \
if (++(_he_newbkt->count) > (tbl)->ideal_chain_maxlen) { \
(tbl)->nonideal_items++; \
- _he_newbkt->expand_mult = _he_newbkt->count / (tbl)->ideal_chain_maxlen; \
+ if ((tbl)->ideal_chain_maxlen) \
+ _he_newbkt->expand_mult = _he_newbkt->count / (tbl)->ideal_chain_maxlen; \
} \
_he_thh->hh_prev = NULL; \
_he_thh->hh_next = _he_newbkt->hh_head; \
The code is definitely not time-critical (it happens only during reallocation). However, uthash currently doesn't use any
assert
s, so adding one would mean adding#include <assert.h>
and conceivably breaking some codebase where not-using-<assert.h>
was considered a feature. (OTOH,<utlist.h>
does useassert
already. But the userbases of uthash and utlist don't seem to overlap much, from what I've seen.)
That makes sense. Without using assertions, I think a simple if
like @tstruk suggested seems to be the best solution.
(tbl)->ideal_chain_maxlen = \ - ((tbl)->num_items >> ((tbl)->log2_num_buckets+1U)) + \ - ((((tbl)->num_items & (((tbl)->num_buckets*2U)-1U)) != 0U) ? 1U : 0U); \ + ((tbl)->num_items + 2*(tbl)->num_buckets - 1U) \ + >> ((tbl)->log2_num_buckets + 1U) \ (tbl)->nonideal_items = 0; \
Would that help the static analyzer at all?
Unfortunately the analyser seems to be pretty dumb when trying to deduce whether ideal_chain_maxlen
can be zero: even something as easy as
tbl->ideal_chain_maxlen = (tbl->num_items >> 1U) + (((tbl->num_items & 1U) != 0U) ? 1U : 0U);
is not recognised to be nonzero for every tbl->num_items>0
. So I think it's going to be difficult to find an expression that Clang is going to be content with.
Actually, now that I look at this code...
if (++(_he_newbkt->count) > (tbl)->ideal_chain_maxlen) { \
(tbl)->nonideal_items++; \
_he_newbkt->expand_mult = _he_newbkt->count / (tbl)->ideal_chain_maxlen; \
}
...surely we could just eliminate the division completely by doing something like this: master...Quuxplusone:assert
Would that satisfy Clang static analysis?
@Quuxplusone Great, that does not produce any warnings with scan-build
anymore!