nflverse / nflverse-data

Automated nflverse data repository

Home Page:https://www.nflverse.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[FEATURE REQ] Extend serialization formats that encourage backwards and forwards compatibility

pratikthanki opened this issue · comments

Is there an existing issue for this?

  • I have searched the existing issues

Is your feature request related to a problem? Please describe.

The current file formats can be prone to breaking changes to columns and/or data types. With the extensive use of the data in this repository moving more towards supporting backwards and forwards compatibility would allow users to take new fields in their own time.

Describe the solution you'd like

Extend the current process which outputs files to also support other serialization formats, perhaps something like; Protobuf or FlatBuffers.

There is also the added benefit of better deserialzation performance based on the languages being used to read the data.

Describe alternatives you've considered

Maintaining my own mapping based off the CSV or Parquet file but this would require a fair amount of intervention and would not be particularly robust

Additional context

No response

Hey @mrcaseb @tanho63, alternatively, is there a way for me to contribute this change back to the project?

commented

Hmm, I'm not convinced of the benefits of adding another format in exchange for the increased complexity of maintaining it. Parquet already covers the majority of analytical use-cases in terms of consistent data typing and columnar storage, while protobuf and flatbuffers seem to be row-oriented and more useful for write-centric uses where the contents and schemas change more frequently.

I think we will hold off on implementing this but could revisit at some point in the future if a need appears.