quixio / quix-streams

Quix Streams - A library for data streaming and Python Stream Processing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature Improvement - from_dataframe() -> 'TimeseriesDataRaw':

JotaBlanco opened this issue · comments

I understand that in streaming is quite uncommon to have big chunks (batches) of data been publish, so a long dataframe with lots or rows is unlikely.
Anyhow, right now the dataframe to TimeseriesDataRaw conversion is done by row and column, which is inefficient. I'll be proposing a new vectorized version to improve speed (specially with big dataframes).

Ok, interesting, I've been playing with a vectorized version of the existing from_dataframe() and this is what I've found:

  • My new code is much faster with big dataframes (above ~250 rows) and much slower with smaller dataframes.
    image

  • Given that most messages will have a low number of rows (streaming data, not batches of data), I think the current code should stay.

  • I still like to share my vectorized version for future reference, what's the best way? @peter-quix

  • Whilst doing this exercise, I found two bugs that need solving:

    • bug 1: from_dataframe brakes if there are nulls in the time column. I've raised a bug: #54
    • bug 2: from_dataframe brakes if there are complex numbers in the dataframe. I've raised a bug: #55

No longer relevant, new python code is not working the same way