graphistry / arrow

prerelease built versions of arrow/master for graphistry

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Struct should allow accessing child vectors by name/index

trxcllnt opened this issue · comments

Per @TheNeuralBit's feedback, it would be nice if Struct allowed access to its child vectors by name and index. This somewhat mirrors the functionality of Table (though Table is intended to include some notion of indexing), so there may be an opportunity to unify some concepts here.

Proposed APIs:

class Struct {
  getColumn<T>(columnName: string): Vector<T>;
  getColumnAt<T>(columnIndex: number): Vector<T>;
} 

Good point about mirroring the functionality of a Table. Maybe we could achieve this functionality with a static fromStruct function in Table or a toTable function in Struct?
I experimented a bit with the former last week and it seemed to be working.

@TheNeuralBit so thinking through this further, it seems like providing access to the child vectors (or creating a table from the child vectors) works when the Struct has no validity vector (or it's all 1s), but is potentially unsafe/incorrect otherwise. Should we AND the Struct's validity with its child's validity when someone calls getColumn (or to synthesize a Table)?

@TheNeuralBit added Table.fromStruct method in e1c7852. I figure we can circle back if validity becomes an issue.

I tried adding a struct#toTable method, but closure-compiler complains about the semi-circular dependency tree. We can rearrange the imports later if we really want to add it, but this I'm hoping this makes life easier for you guys in the short term. Fix is published out on npm in v0.1.2