Feature Improvement - from_dataframe() -> 'TimeseriesDataRaw':
JotaBlanco opened this issue · comments
I understand that in streaming is quite uncommon to have big chunks (batches) of data been publish, so a long dataframe with lots or rows is unlikely.
Anyhow, right now the dataframe to TimeseriesDataRaw conversion is done by row and column, which is inefficient. I'll be proposing a new vectorized version to improve speed (specially with big dataframes).
Ok, interesting, I've been playing with a vectorized version of the existing from_dataframe() and this is what I've found:
-
My new code is much faster with big dataframes (above ~250 rows) and much slower with smaller dataframes.
-
Given that most messages will have a low number of rows (streaming data, not batches of data), I think the current code should stay.
-
I still like to share my vectorized version for future reference, what's the best way? @peter-quix
-
Whilst doing this exercise, I found two bugs that need solving:
Btw, find here both to_dataframe() new and old and how they compare: https://github.com/quixio/quix-streams-to_dataframe_tests/blob/main/src/PythonClient/tests/quixstreams/manual/from_dataframe.ipynb
No longer relevant, new python code is not working the same way