kelindar / column

High-performance, columnar, in-memory store with bitmap indexing in Go

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Possible undesired behavior when using query filters

mark-hartmann opened this issue · comments

In the With / Without / Union functions it is not checked whether the passed columns are indexes, which might cause unexpected/undesired behavior if used with "normal" columns.

Let's take the following example:

c := column.NewCollection()
c.CreateColumn("name", column.ForString())
c.CreateColumn("some-column", column.ForInt16())

c.Insert(func(row column.Row) error {
	row.SetString("name", "john")
	row.SetInt16("some-column", 100)
	return nil
})
c.Insert(func(row column.Row) error {
	row.SetString("name", "jane")
	return nil
})

If I now start a query over the rows and only want to have the rows that have a value in this column, it works fine:

c.Query(func(txn *column.Txn) error {
	// Prints "1" 
	fmt.Println(txn.With("some-column").Count())
})

If the attribute for "jane" is set somewhere later in the program, this is of course picked up correctly using With. The problem is, however, that you can no longer get "jane" out of the With, because you can't "unset" columns.

By the way, it is not even checked whether the column exists at all. In my opinion, this should cause a panic like it does in other places (e.g. the *ReaderFor functions).

c.Query(func(txn *column.Txn) error {
	// Column does not exist 
	txn.With("abc").Count()
})

Now I have to ask myself if users should be able to pass any column (if so, there should be a Unset(column string) in txn) or if only the WithValue function should be used for normal columns (as I interpret it). In this case you should somehow check this within Txn, possibly via the owner or column, since this struct contains the IsIndex function.

Just a quick thought; a small check for idx.IsIndex() in the methods themselves should solve at least the non-index-column problem, if it is a problem at all.

if idx, ok := txn.columnAt(columnName); ok && idx.IsIndex() {
	// ...
}

Hello Mark,

I've run into an error with using the package's txn.WithValue method with a boolean column, and traced the bug back to the distinction between a boolean and index column type.

When WithValue calls index.Filter to find matches, it uses c.Value to dictate both the value of the object and if it exists in the bitmap (txn.go +197) -

	txn.rangeRead(func(offset uint32, index bitmap.Bitmap) {
		index.Filter(func(x uint32) (match bool) {
			if v, ok := c.Value(offset + x); ok {
				match = predicate(v)
			}
			return
		})
	})

In the case of a false boolean value in the map, columnBool.Value (column.go +222, which call's kelindar/bitmap bitmap.go +33) will return false,false in the case of a false value, not executing the given predicate call. In the tests, the "active" boolean column is filtered via With and Without (txn_test.go +84, 90), making me think that boolean columns should be regarded as 'manually-set-indexes' rather than a normal column.

I can't tell if this is truly a bug, but if it is, I feel like the best route for solving both our problems is to fix up the columnBool.Value method and then add your index check.

Let me know what you think

Hi, first of all I would like to apologize for the long time it took me to reply.

The idea of treating boolean columns as indexes sounds right to me, however the behavior of columnBool.Value seems strange to me as well. If a value exists - even if it is false - the predicate would have to be called.

I would have to dig a bit more into the column/bitmap library itself, but it seems to me that Bitmap.Contains might be the actual problem here. When Bitmap.Contains is called, it returns false if the value is false, which is not necessarily correct for boolean columns or any other data type and its null value. I don't know if this is really related to null values or not, I`ll have to take a closer look in the next days. If you have other suspicions, feel free to let me know.