xitongsys / parquet-go

pure golang library for reading/writing parquet file

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Potential conflict for non alphabetic character leading schema

chris920820 opened this issue · comments

Hey, @xitongsys !
Per a7314a1,
Seems we are adding a prefix P_ to the schema that is not leading by nonalphabetic character.
However, if a parquet has the schema P__x and _x, it will result in conflict, since we can no longer distinguish if data came from P__x or _x. Also, it might be problematic for the consumer to know this convention. For example, if the consumer is expecting the column _x exist, and try to read data using name _x it will fail because it has internally converted to P__x.

Do we have some places that enforce this naming convention (no leading non alphabetic char)? Does Golang compiler enforce that in some places?

Is there any better we could handle this more gracefully? To avoid using non alphabetic leading characters as variable name, could we can add a global prefix instead of just add a prefix of certain columns?

hi, @chris920820
Sorry for so late response.
For now I just mitigate this issue in the pull request #310 and also add some comment in readme.