dolthub / dolt

Dolt – Git for Data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Get rid of column tags

zachmu opened this issue · comments

In general Dolt databases are history independent: if two tables have the same schema and data, they have the same hash.

Column tags can violate this property. It's possible for two different series of ALTER TABLE statements to produce two schemas that are identical except for their column tags, meaning that they don't have the same hash. This is especially frustrating because the diff between these two table revisions is invisible to the customer, as it cannot be expressed in SQL.

After our new storage format shipped, the only product reason to keep tags was to be able to identify a column rename. This is useful, but maybe not useful enough to justify violating history independence. There are other ways to handle column renames (.e.g. a merge strategy that lets the customer declare a column rename explicitly, or simple heuristics like git uses to detect renames). Tag issues, when they occur, are so frustrating that to understand and debug that it may be worthwhile to lose deterministic column renames for.