Support datetime for std
itholic opened this issue · comments
Haejoon Lee commented
pandas support datetime64 or datetime64tz dtypes for std
from pandas 1.2 (pandas-dev/pandas#37436)
And it returns Timedelta Series which is Koalas currently cannot support.
>>> pdf = pd.DataFrame(
... {
... "A": pd.date_range("2020-01-01", periods=3),
... "B": pd.date_range("2021-01-01", periods=3),
... }
... )
>>> kdf = ks.from_pandas(pdf)
>>> pdf.std()
A 1 days
B 1 days
dtype: timedelta64[ns]
>>> kdf.std()
Series([], dtype: float64)
ljluestc commented
import databricks.koalas as ks
import pandas as pd
pdf = pd.DataFrame(
{
"A": pd.date_range("2020-01-01", periods=3),
"B": pd.date_range("2021-01-01", periods=3),
}
)
kdf = ks.from_pandas(pdf)
# Calculate the standard deviation after converting Timedelta to numeric
std_result = kdf.select_dtypes(include=["timedelta"]).apply(lambda x: x.dt.total_seconds()).std()
print(std_result)
Haejoon Lee commented
>>> std_result = kdf.select_dtypes(include=["timedelta"]).apply(lambda x: x.dt.total_seconds()).std()
>>> print(std_result)
Series([], dtype: float64)
Seems like the suggested method still returns empty Series.
Btw, switching Koalas to Pandas API on Spark is recommended as Koalas is migrated into PySpark.