DataFrame.append causes unexpected dtype change in output DataFrame
thehomebrewnerd opened this issue · comments
Nate Parsons commented
When appending a dataframe to another with df.append
the column dtypes change unexpectedly at times. An example of this is below, showing that the boolean
columns have been changed to bool
. This same issue happens if the original dataframes are Int64
- the new dataframe will have those columns changed to int64
.
I would expect that the output dtype would not change if the input dataframe dtypes were the same for a given column.
import pandas as pd
import databricks.koalas as ks
df1 = pd.DataFrame({'id': [0], 'val': pd.Series([True], dtype='boolean')})
df2 = pd.DataFrame({'id': [1], 'val': pd.Series([False], dtype='boolean')})
ks1 = ks.from_pandas(df1)
ks2 = ks.from_pandas(df2)
ks1.dtypes
id int64
val boolean
dtype: object
ks2.dtypes
id int64
val boolean
dtype: object
new_ks = ks1.append(ks2)
new_ks.dtypes
id int64
val bool
dtype: object