Possible undesired behavior when using query filters
mark-hartmann opened this issue · comments
In the With
/ Without
/ Union
functions it is not checked whether the passed columns are indexes, which might cause unexpected/undesired behavior if used with "normal" columns.
Let's take the following example:
c := column.NewCollection()
c.CreateColumn("name", column.ForString())
c.CreateColumn("some-column", column.ForInt16())
c.Insert(func(row column.Row) error {
row.SetString("name", "john")
row.SetInt16("some-column", 100)
return nil
})
c.Insert(func(row column.Row) error {
row.SetString("name", "jane")
return nil
})
If I now start a query over the rows and only want to have the rows that have a value in this column, it works fine:
c.Query(func(txn *column.Txn) error {
// Prints "1"
fmt.Println(txn.With("some-column").Count())
})
If the attribute for "jane" is set somewhere later in the program, this is of course picked up correctly using With
. The problem is, however, that you can no longer get "jane" out of the With
, because you can't "unset" columns.
By the way, it is not even checked whether the column exists at all. In my opinion, this should cause a panic like it does in other places (e.g. the *ReaderFor
functions).
c.Query(func(txn *column.Txn) error {
// Column does not exist
txn.With("abc").Count()
})
Now I have to ask myself if users should be able to pass any column (if so, there should be a Unset(column string)
in txn
) or if only the WithValue
function should be used for normal columns (as I interpret it). In this case you should somehow check this within Txn
, possibly via the owner or column
, since this struct contains the IsIndex
function.
Just a quick thought; a small check for idx.IsIndex()
in the methods themselves should solve at least the non-index-column problem, if it is a problem at all.
if idx, ok := txn.columnAt(columnName); ok && idx.IsIndex() {
// ...
}
Hello Mark,
I've run into an error with using the package's txn.WithValue
method with a boolean column, and traced the bug back to the distinction between a boolean and index column type.
When WithValue
calls index.Filter
to find matches, it uses c.Value to dictate both the value of the object and if it exists in the bitmap (txn.go +197) -
txn.rangeRead(func(offset uint32, index bitmap.Bitmap) {
index.Filter(func(x uint32) (match bool) {
if v, ok := c.Value(offset + x); ok {
match = predicate(v)
}
return
})
})
In the case of a false boolean value in the map, columnBool.Value
(column.go +222, which call's kelindar/bitmap bitmap.go +33) will return false,false
in the case of a false value, not executing the given predicate call. In the tests, the "active" boolean column is filtered via With
and Without
(txn_test.go +84, 90), making me think that boolean columns should be regarded as 'manually-set-indexes' rather than a normal column.
I can't tell if this is truly a bug, but if it is, I feel like the best route for solving both our problems is to fix up the columnBool.Value method and then add your index check.
Let me know what you think
Hi, first of all I would like to apologize for the long time it took me to reply.
The idea of treating boolean columns as indexes sounds right to me, however the behavior of columnBool.Value
seems strange to me as well. If a value exists - even if it is false - the predicate would have to be called.
I would have to dig a bit more into the column/bitmap library itself, but it seems to me that Bitmap.Contains
might be the actual problem here. When Bitmap.Contains
is called, it returns false
if the value is false
, which is not necessarily correct for boolean columns or any other data type and its null value. I don't know if this is really related to null values or not, I`ll have to take a closer look in the next days. If you have other suspicions, feel free to let me know.