quixio / quix-streams

Quix Streams - A library for data streaming and Python Stream Processing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add more docs to windowing

tomas-quix opened this issue · comments

Add docs to explain how to use custom reduce fn with Mean. Something like this:

import os
from quixstreams import Application, State
import uuid
from dotenv import load_dotenv
from datetime import datetime

load_dotenv("./.env")

app = Application.Quix(str(uuid.uuid4()), auto_offset_reset="earliest")
input_topic = app.topic(os.environ["input"], value_deserializer='json')
#output_topic = app.topic(os.environ["output"], value_serializer='json')

sdf = app.dataframe(input_topic)
sdf = sdf[sdf.contains("location-speed")]

sdf = sdf[["Timestamp", "location-speed"]]

def reduce_mean(state: dict, row:dict):
    
    state["sum_speed"] += row["location-speed"]
    state["count_speed"] += 1
    
    return state

def init_mean(row: dict):
    return {
        "sum_speed": row["location-speed"],
        "count_speed": 1
    }

sdf = sdf.tumbling_window(60000).reduce(reduce_mean, init_mean).final()

sdf = sdf.apply(lambda row: {
    "timestamp": row["start"],
    "mean_speed": row["value"]["sum_speed"] / row["value"]["count_speed"]
})

sdf = sdf.update(lambda row: print((row)))
#sdf = sdf.to_topic(output_topic)

if __name__ == "__main__":
    app.run(sdf)

Explain how to use aggregate fn like for example standard deviation:

import os
from quixstreams import Application, State
import uuid
from dotenv import load_dotenv
from datetime import datetime
import statistics

load_dotenv("./.env")

app = Application.Quix(str(uuid.uuid4()), auto_offset_reset="earliest")
input_topic = app.topic(os.environ["input"], value_deserializer='json')
#output_topic = app.topic(os.environ["output"], value_serializer='json')

sdf = app.dataframe(input_topic)
sdf = sdf[sdf.contains("location-speed")]

sdf = sdf[["Timestamp", "location-speed"]]

def reduce_mean(state: dict, row:dict):
    
    state["speed_values"].append(row["location-speed"])
    return state

def init_mean(row: dict):
    return {
        "speed_values": [row["location-speed"]],
    }

sdf = sdf.tumbling_window(60000).reduce(reduce_mean, init_mean).final()

sdf = sdf.apply(lambda row: {
    "timestamp": row["start"],
    "mean_speed": statistics.stdev(row["speed_values"])
})

sdf = sdf.update(lambda row: print((row)))
#sdf = sdf.to_topic(output_topic)

if __name__ == "__main__":
    app.run(sdf)

Please use an end-to-end use case scenario to explain this concept. I.e. "suppose that you have a sensor that produces, temperature every X and you want to Y", here's how you would do it.

Hey @tomas-quix , the current docs already have an example for reduce() here

Do you want to replace it with the provided one?

Hey @tomas-quix

The new docs are released in #315

I'm going to close this issue.
If you think they still need more examples, please reopen it.