HazyResearch / meerkat

Creative interactive views of any dataset.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[FEATURE] Avoid recomputing values when chaining `LambdaColumn` in one DataPanel

seyuboglu opened this issue · comments

If a single DataPanel contains a chain of LambdaColumns, like so

dp["a_b"] = LambdaColumn(dp["a"], fn)
dp["a_b_c"] = LambdaColumn(dp["a_b"], fn_2)

then indexing the DataPanel with dp[0] will perform the materialization of dp["a_b"] twice.

Ideally, the DataPanel should be aware of these dependencies and only materialize things once.

In the same vein, we should support linked LambdaColumns where two columns are both products of the same operation .

A use case of this comes up in #222, where an audio loader returns both the time series and the sample rate, which should be in separate columns