Incorrect results with short city names
viliam-durina opened this issue · comments
This code in the (currently) best solution will give incorrect results if there happen to be two semicolons in a single word
. This is possible since a minimal record is 6 bytes, e.g.:
a;0.0
a;0.0
...
I've added short test cases #277 which royvanrijn
passes. Maybe you could provide a test case to show the problem?
This is also covered in an existing test case:
1brc/src/test/resources/samples/measurements-complex-utf8.txt
Lines 15 to 16 in fa1ca65
My fault, I didn't actually run the code at all. I only assumed that the hash can be calculated differently for the same city name, and that it must lead to incorrect results.
Today I debugged and indeed two entries can be created for the same city in MeasurementRepository
. But it's resolved in the final combine step when the String city
field of MeasurementRepository.Entry
is used as the key for the TreeMap
, and this field is calculated correctly, so the end results are correct.