databricks / koalas

Koalas: pandas API on Apache Spark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DataFrame.append causes unexpected dtype change in output DataFrame

thehomebrewnerd opened this issue · comments

When appending a dataframe to another with df.append the column dtypes change unexpectedly at times. An example of this is below, showing that the boolean columns have been changed to bool. This same issue happens if the original dataframes are Int64 - the new dataframe will have those columns changed to int64.

I would expect that the output dtype would not change if the input dataframe dtypes were the same for a given column.

import pandas as pd
import databricks.koalas as ks

df1 = pd.DataFrame({'id': [0], 'val': pd.Series([True], dtype='boolean')})
df2 = pd.DataFrame({'id': [1], 'val': pd.Series([False], dtype='boolean')})
ks1 = ks.from_pandas(df1)
ks2 = ks.from_pandas(df2)
ks1.dtypes
id       int64
val    boolean
dtype: object
ks2.dtypes
id       int64
val    boolean
dtype: object
new_ks = ks1.append(ks2)
new_ks.dtypes
id     int64
val     bool
dtype: object