socrata / soda-ruby

A RubyGem for the Socrata Open Data API

Home Page:http://socrata.github.io/soda-ruby

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fields are omitted if they have a `nil` value

notactuallytreyanastasio opened this issue · comments

I am unsure if this is intentional or not. There were a couple of things that led to us noticing this.

We wanted to grab all the Acris Real Property Parties. So we had a simple line like this:

      # because there's 47 million and we didn't wanna page, its a beefy server
      limit = 50_000_000
      data = SOCRATA_CLIENT.get(doc_code, { "$limit" => limit }).body

When I ran this I got a ton of errors (47 million-ish to be exact):

You are setting a key that conflicts with a built-in method Hashie::Mash#zip defined in Enumerable. This can cause unexpected behavior when accessing the key as a property. You can still access the key via the #[] method.

Now, a suspect bit: this dataset has a field called zip and it's yelling about the method zip. I think this is a red herring but worth mentioning just in case.

However when I began to sample things out, I noticed something.

When a field has a null value, it ends up being omitted from the hash. For example, Document ID 2019022000195001 only has returns this:

{"document_id"=>"2019022000195001",
 "record_type"=>"P",
 "party_type"=>"2",
 "name"=>"CENTENNIAL BANK",
 "good_through_date"=>"2020-01-31T00:00:00.000"}

I would expect it to return all keys that are from the CSV online. The reasoning for this is:

  1. We have a dataset that has N records
  2. Of those N records, each one is missing the same field
  3. Now we don't know what the actual CSV online had as a total aggregate of all columns

We specifically track additions/deletions of these columns, as its pretty core to our business to work with these things, and they don't do API versioning to make it simple to track without us having to build our own layer.

It appears that the Socrata API does not omit fields with nil values. Please tell me if I'm mistaken on this one, because if it does my point is moot.

On the warnings from hashie: grepping around for zip in the codebase showed nothing, I'm not sure where the callsite is that is causing that, but I'd be happy to fix it if someone can point me in the right direction.

On the omitted fields: Is this an intentional behavior? If not would you be open to a pull request that changes this to not omit those fields in a configurable way? Maybe a default argument that maintains this behavior as the standard but offers an option to not do so by changing that, preventing the public API from changing which would result in needing a major version bump/release etc.

Thanks for making and maintaining this!

It appears this is intentional via the normal API and some headers can tell all given the scenario I included. Apologies for misreading that, gonna close this. Happy hacking and thanks for making this again! 💯