Wrong results for comparing collections of different sizes

Question

Wrong results for comparing collections of different sizes

duncand opened this issue 6 years ago · comments

The CQL Engine has a widespread bug where doing comparisons such as Equivalent() or Equal() on two Tuple or on two Iterable or other collection types give the wrong answers when the collections being compared have different sizes.

Bugs exist in at least these files, and most likely in others with similar functionality:

EquivalentEvaluator.java
EqualEvaluator.java
Tuple.java -> equal()

When comparing two Tuple, The logic takes the left-hand argument and iterates through its attributes; for each it tests that the right-hand argument contains an attribute of its name, and if so, that their attribute values are the same.

But the bug is that when the right-hand argument has any attributes that the left-hand attribute does not, this evades detection, and so non-equal Tuple could be called equal.

A consequence of this is that in the general case, Equivalent(x,y) and Equivalent(y,x) return different answers, and in one ordering the answer is wrong, while in the other it is right by accident.

The simplest fix is that every comparison of two Tuple starts by asking if the count of attributes for both are equal; only if they are, do we do the existing tests, and otherwise we return false.

At first glance there appears to be similar bugs in the Iterable comparator, as there is no test that the count of elements is the same there, though it might be caught in other ways since we're comparing ordered sequences rather than unordered sets; Tuple is definitely buggy though.

There is no pure-CQL test case for this written yet since the CQL Translator seems to declare mis-matched Tuple headings a compile-time error if they are given as literals. However, if we have two external functions that each return a Tuple, using Equivalent() etc on the results manifests the stated bug, since there were no Tuple literals for the Translator to compile-fail.

I can produce/attach simple test code to to this issue on request that is partly CQL and partly Java to demonstrate the issue, however the bug and fix should be obvious and simple just looking at the above Java source files.

In the general case though, the CQL Engine should be complete enough to give proper results on all such inputs regardless of whether the Translator might compile-fail them or not.

Darren Duncan · Answer 1 · Thu Feb 22 2018 14:56:29 GMT+0800 (China Standard Time)

A similar and more general bug is that the CQL Engine does not usually test that both arguments to Equivalent() and Equal() are of the same type. Typically just the left-hand argument is type-tested and the right-hand argument is just assumed to be the same. As rare exceptions, both sides ARE tested when the left is DateTime or Time (or when either is null) but for most other types they are not. The routines should be updated to test that both arguments are of the expected type and not just one argument. This is also fairly easy to fix, and loosely speaking it is the same kind of problem as the original report, and they should all be fixed together. The general problem is assuming the Translator will block certain comparisons at compile time, but we can't rely on that, and sometimes there are legitimate reasons it wouldn't.

c-schuler · Answer 2 · Sun Feb 25 2018 10:59:38 GMT+0800 (China Standard Time)

Please create another issue for your second comment as it is a bigger issue of which the Equal and Equivalent evaluators are a small subset of.

Darren Duncan · Answer 3 · Sun Feb 25 2018 12:55:16 GMT+0800 (China Standard Time)

New issue #69 was created to replace my second comment on issue #63.

Darren Duncan · Answer 4 · Sun Feb 25 2018 13:50:51 GMT+0800 (China Standard Time)

Pull request #68 fixed the Tuple comparison issue raised in issue #63.