databricks / koalas

Koalas: pandas API on Apache Spark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Column names with "_" raises KeyError in pivot_table

crucis opened this issue · comments

The following example:

df = ks.DataFrame({"ui": ['C', 'D', 'D', 'C'],
                   "foo": ['one', 'one', 'two', 'two'],
                   "bar": ['A', 'A', 'B', 'C'],
                   "ar_a": [1, 2, 2, 2],
                   "baz_d": [1, 2, 3, 4]}, columns=['ui', 'foo', 'bar', 'baz_d', 'ar_a'])
df.pivot_table(index=['ui','foo'] , columns='bar', values=['baz_d', 'ar_a'], aggfunc='first')

Raises the following error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<command-4107535394601475> in <module>
----> 1 df.pivot_table(index=['ui','foo'] , columns='bar', values=['baz_d', 'ar_a'], aggfunc='first')

/databricks/python/lib/python3.8/site-packages/databricks/koalas/usage_logging/__init__.py in wrapper(*args, **kwargs)
    193             start = time.perf_counter()
    194             try:
--> 195                 res = func(*args, **kwargs)
    196                 logger.log_success(
    197                     class_name, function_name, time.perf_counter() - start, signature

/databricks/python/lib/python3.8/site-packages/databricks/koalas/frame.py in pivot_table(self, values, index, columns, aggfunc, fill_value)
   6080                         zip(self._internal.data_spark_column_names, self._internal.column_labels)
   6081                     )
-> 6082                     column_labels = [
   6083                         tuple(list(column_name_to_index[name.split("_")[1]]) + [name.split("_")[0]])
   6084                         for name in data_columns

/databricks/python/lib/python3.8/site-packages/databricks/koalas/frame.py in <listcomp>(.0)
   6081                     )
   6082                     column_labels = [
-> 6083                         tuple(list(column_name_to_index[name.split("_")[1]]) + [name.split("_")[0]])
   6084                         for name in data_columns
   6085                     ]

KeyError: 'ar'

Is snake_case not supported in koalas DataFrame?

Thanks for the report!!

Seems like bug in Koalas, we should fix this.