databricks / koalas

Koalas: pandas API on Apache Spark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Supporting allows_duplicate_labels for Series and DataFrame

itholic opened this issue · comments

pandas experimentally started to support allows_duplicate_labels when creating Series or DataFrame to control whether the index or columns can contain duplicate labels from pandas 1.2.

In [1]: pd.Series([1, 2], index=['a', 'a'])
Out[1]:
a    1
a    2
Length: 2, dtype: int64

In [2]: pd.Series([1, 2], index=['a', 'a']).set_flags(allows_duplicate_labels=False)
...
DuplicateLabelError: Index has duplicates.
      positions
label
a        [0, 1]

They said,

This is an experimental feature. Currently, many methods fail to propagate the allows_duplicate_labels value. In future versions it is expected that every method taking or returning one or more DataFrame or Series objects will propagate allows_duplicate_labels.

Thus, I think Koalas also better to prepare supporting this feature.