`.last()` can't be used on LazyGroupBy
jmakov opened this issue · comments
Checks
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of Polars.
Reproducible example
lf = polars.LazyFrame({
"time": polars.datetime_range(
start=datetime.datetime(2021, 12, 16),
end=datetime.datetime(2021, 12, 16, 3),
interval="30m",
eager=True),
"n": range(7),
"m": range(7)})
lf.group_by_dynamic("time", every="1h", closed="right").last().collect()
Log output
No response
Issue description
.last()
can't be used on LazyGroupBy
Expected behavior
According to the docs, it should work
Installed versions
--------Version info---------
Polars: 0.20.31
Index type: UInt32
Platform: Linux-6.6.32-1-MANJARO-x86_64-with-glibc2.39
Python: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0]
----Optional dependencies----
adbc_driver_manager: <not installed>
cloudpickle: 3.0.0
connectorx: 0.3.3
deltalake: <not installed>
fastexcel: <not installed>
fsspec: 2024.6.0
gevent: <not installed>
hvplot: 0.10.0
matplotlib: 3.7.3
nest_asyncio: 1.6.0
numpy: 1.25.2
openpyxl: <not installed>
pandas: 2.0.3
pyarrow: 14.0.2
pydantic: 1.10.16
pyiceberg: <not installed>
pyxlsb: <not installed>
sqlalchemy: <not installed>
torch: 2.1.2.post300
xlsx2csv: <not installed>
xlsxwriter: <not installed>
Just to expand a bit, this is specific to group_by_dynamic
and the problem is index_column
being duplicated.
(lf.group_by_dynamic("time", every="1h", closed="right")
.last()
.collect()
)
# DuplicateError: column with name 'time' has more than one occurrences
pl.all()
in this case includes the index_column
(which differs to how .group_by()
and by=
behaves)
(lf.group_by_dynamic("time", every="1h", closed="right")
.agg(pl.all().last())
.collect()
)
# DuplicateError: column with name 'time' has more than one occurrences
In this case index_column
needs to be excluded.
(lf.group_by_dynamic("time", every="1h", closed="right")
.agg(pl.exclude("time").last())
.collect()
)
# shape: (4, 3)
# ┌─────────────────────┬─────┬─────┐
# │ time ┆ n ┆ m │
# │ --- ┆ --- ┆ --- │
# │ datetime[μs] ┆ i64 ┆ i64 │
# ╞═════════════════════╪═════╪═════╡
# │ 2021-12-15 23:00:00 ┆ 0 ┆ 0 │
# │ 2021-12-16 00:00:00 ┆ 2 ┆ 2 │
# │ 2021-12-16 01:00:00 ┆ 4 ┆ 4 │
# │ 2021-12-16 02:00:00 ┆ 6 ┆ 6 │
# └─────────────────────┴─────┴─────┘
(I'm not entirely sure if these GroupBy.foo()
shorthand methods are supposed to be allowed for group_by_dynamic
)
Thanks for that. The type is polars.lazyframe.group_by.LazyGroupBy
so I assumed it should work (according to the docs). It would help if e.g. your example would be part of the docs to help understand where the DuplicateError
comes from.