databricks / koalas

Koalas: pandas API on Apache Spark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

read_excel's parameter - mangle_dupe_cols is used to handle duplicate columns but fails if the duplicate columns are case sensitive.

saikrishnapujari102087 opened this issue · comments

mangle_dupe_cols - default is True
So ideally it should have handled duplicate columns, but in case if the columns are case sensitive it fails as below.

AnalysisException: Reference 'Sheet.col' is ambiguous, could be: Sheet.col, Sheet.col.

Where two columns are Col and cOL

In the best practices, there is a mention of not to use case sensitive columns - https://koalas.readthedocs.io/en/latest/user_guide/best_practices.html#do-not-use-duplicated-column-names

Either the docs for read_excel/mangle_dupe_cols has to be updated about this or it has to be handled.

Yeah, we should address this.

Would you mind file the issue to the Apache Spark JIRA ??

This repository is in maintenance mode, as Koalas has been moved in to the PySpark (pandas API on Spark).