Column names with "_" raises KeyError in pivot_table
crucis opened this issue · comments
The following example:
df = ks.DataFrame({"ui": ['C', 'D', 'D', 'C'],
"foo": ['one', 'one', 'two', 'two'],
"bar": ['A', 'A', 'B', 'C'],
"ar_a": [1, 2, 2, 2],
"baz_d": [1, 2, 3, 4]}, columns=['ui', 'foo', 'bar', 'baz_d', 'ar_a'])
df.pivot_table(index=['ui','foo'] , columns='bar', values=['baz_d', 'ar_a'], aggfunc='first')
Raises the following error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<command-4107535394601475> in <module>
----> 1 df.pivot_table(index=['ui','foo'] , columns='bar', values=['baz_d', 'ar_a'], aggfunc='first')
/databricks/python/lib/python3.8/site-packages/databricks/koalas/usage_logging/__init__.py in wrapper(*args, **kwargs)
193 start = time.perf_counter()
194 try:
--> 195 res = func(*args, **kwargs)
196 logger.log_success(
197 class_name, function_name, time.perf_counter() - start, signature
/databricks/python/lib/python3.8/site-packages/databricks/koalas/frame.py in pivot_table(self, values, index, columns, aggfunc, fill_value)
6080 zip(self._internal.data_spark_column_names, self._internal.column_labels)
6081 )
-> 6082 column_labels = [
6083 tuple(list(column_name_to_index[name.split("_")[1]]) + [name.split("_")[0]])
6084 for name in data_columns
/databricks/python/lib/python3.8/site-packages/databricks/koalas/frame.py in <listcomp>(.0)
6081 )
6082 column_labels = [
-> 6083 tuple(list(column_name_to_index[name.split("_")[1]]) + [name.split("_")[0]])
6084 for name in data_columns
6085 ]
KeyError: 'ar'
Is snake_case not supported in koalas DataFrame?
Thanks for the report!!
Seems like bug in Koalas, we should fix this.