read_excel's parameter - mangle_dupe_cols is used to handle duplicate columns but fails if the duplicate columns are case sensitive.

Question

read_excel's parameter - mangle_dupe_cols is used to handle duplicate columns but fails if the duplicate columns are case sensitive.

saikrishnapujari102087 opened this issue 2 years ago · comments

mangle_dupe_cols - default is True
So ideally it should have handled duplicate columns, but in case if the columns are case sensitive it fails as below.

AnalysisException: Reference 'Sheet.col' is ambiguous, could be: Sheet.col, Sheet.col.

Where two columns are Col and cOL

In the best practices, there is a mention of not to use case sensitive columns - https://koalas.readthedocs.io/en/latest/user_guide/best_practices.html#do-not-use-duplicated-column-names

Either the docs for read_excel/mangle_dupe_cols has to be updated about this or it has to be handled.

Haejoon Lee · Answer 1 · Tue Jan 25 2022 09:51:12 GMT+0800 (China Standard Time)

Yeah, we should address this.

Would you mind file the issue to the Apache Spark JIRA ??

This repository is in maintenance mode, as Koalas has been moved in to the PySpark (pandas API on Spark).

Saikrishna Pujari · Answer 2 · Tue Jan 25 2022 23:06:31 GMT+0800 (China Standard Time)

@itholic Yes, created - https://issues.apache.org/jira/browse/SPARK-38004