read_excel's parameter - mangle_dupe_cols is used to handle duplicate columns but fails if the duplicate columns are case sensitive.
saikrishnapujari102087 opened this issue · comments
mangle_dupe_cols - default is True
So ideally it should have handled duplicate columns, but in case if the columns are case sensitive it fails as below.
AnalysisException: Reference 'Sheet.col
' is ambiguous, could be: Sheet.col, Sheet.col.
Where two columns are Col and cOL
In the best practices, there is a mention of not to use case sensitive columns - https://koalas.readthedocs.io/en/latest/user_guide/best_practices.html#do-not-use-duplicated-column-names
Either the docs for read_excel/mangle_dupe_cols has to be updated about this or it has to be handled.
Yeah, we should address this.
Would you mind file the issue to the Apache Spark JIRA ??
This repository is in maintenance mode, as Koalas has been moved in to the PySpark (pandas API on Spark).
@itholic Yes, created - https://issues.apache.org/jira/browse/SPARK-38004