Device.find(camera_id: nil) does not find devices

Question

Device.find(camera_id: nil) does not find devices

opened this issue 8 years ago · comments

time-agent(prod)> Device.all.each {|d| p d.camera_id}
nil
nil
nil
"56f290913d1f9207691bab65"
nil
time-agent(prod)> Device.find(camera_id: '56f290913d1f9207691bab65').count
=> 1
time-agent(prod)> Device.find(camera_id: nil).count
=> 0

I'm expecting it to find the 4 devices where camera_id is nil, but it's not finding anything. Any ideas?

Michel Martens · Answer 1 · Mon Nov 14 2016 00:57:40 GMT+0800 (China Standard Time)

Hey @xanview, sorry for not replying earlier. I was on a trip and didn't have time for a proper answer. I think at some point, in earlier versions, it was possible to index nil values. But then we decided against it because if you have a lot of nils (which is a common scenario), then the indices created are huge. What you can do is to force a value so that you can find it later (for example, you could use the string "nil").

Deleted user · Answer 2 · Sat Nov 19 2016 00:43:12 GMT+0800 (China Standard Time)

Hi Soveran,

Thank you for the reply. Sure that makes sense not to index nil by default, but maybe it could be an optional option, something like:

index :device_id, index_nil: true

This way you have the best of both worlds? - without resorting to hacks such as a "nil" string value.

Michel Martens · Answer 3 · Thu Nov 24 2016 17:08:46 GMT+0800 (China Standard Time)

I explored that solution, but in the end I didn't find it satisfying. First, a bit more about my previous proposal: instead of using the string "nil" you can use some relevant value, for example "available", or "unassigned", or anything that makes sense to the problem domain. Using "nil" indeed looks like a hack, but I think using something relevant is the best solution to this problem.

Regarding the index_nil feature, the advantage of such implementation would be to cover this use case, but sadly there are some disadvantages: it will add a check for nil and a special case to all calls, and behind the scenes it will generate a string value for the index (so there's no optimization in the final solution if we compare it with the proposal of using a meaningful value). As an aside, on a subjective level, while playing with this implementation it felt wrong to index nil values. I think it's something we shouldn't do in principle. I checked how other databases deal with this situation. Postgres recommends creating a partial index for columns with NULL. MySQL creates an index for IS [NOT] NULL only for certain storage backends and certain keys (for example, it never indexes NULL for primary or unique keys). Oracle provides a NVL (NULL value) function that can be used for creating indices with NULL values, but all the function does is translate NULL into the string "NULL", and I got the impression by reading the docs that there's no gain in using that function vs storing the "NULL" string directly. I found this explanation for the mixed support regarding that kind of indices: "By default, relational databases ignore NULL values because the relational model says that NULL means not present". That's why the default behavior when searching for IS [NOT] NULL in many databases is a full table scan.

After reflecting on this issue, I think the best approach would be to use a meaningful value for that field and rely on the existing indexing infrastructure.

Deleted user · Answer 4 · Sun Nov 27 2016 23:36:55 GMT+0800 (China Standard Time)

I can see what you're saying, but in terms of other databases, I'm used to doing this with Mongo (Mongoid):

User.where(:auth_token.exists => false)

You're right, if no index is configured, it will have to scan the entire database, but at least it gives the expected behaviour of running the requested query.

Maybe it should throw an exception when searching for nil at least? - it can even link to this issue in the exception so the person has a workaround :)

Michel Martens · Answer 5 · Tue Nov 29 2016 04:15:14 GMT+0800 (China Standard Time)

Two great ideas right there: first, raise an error if nil is used in a filtering operation, and second, the idea of providing a lot of help in the error message. I'll work on it, thanks a lot!