Add more docs to windowing
tomas-quix opened this issue · comments
tomas-quix commented
Add docs to explain how to use custom reduce fn with Mean. Something like this:
import os
from quixstreams import Application, State
import uuid
from dotenv import load_dotenv
from datetime import datetime
load_dotenv("./.env")
app = Application.Quix(str(uuid.uuid4()), auto_offset_reset="earliest")
input_topic = app.topic(os.environ["input"], value_deserializer='json')
#output_topic = app.topic(os.environ["output"], value_serializer='json')
sdf = app.dataframe(input_topic)
sdf = sdf[sdf.contains("location-speed")]
sdf = sdf[["Timestamp", "location-speed"]]
def reduce_mean(state: dict, row:dict):
state["sum_speed"] += row["location-speed"]
state["count_speed"] += 1
return state
def init_mean(row: dict):
return {
"sum_speed": row["location-speed"],
"count_speed": 1
}
sdf = sdf.tumbling_window(60000).reduce(reduce_mean, init_mean).final()
sdf = sdf.apply(lambda row: {
"timestamp": row["start"],
"mean_speed": row["value"]["sum_speed"] / row["value"]["count_speed"]
})
sdf = sdf.update(lambda row: print((row)))
#sdf = sdf.to_topic(output_topic)
if __name__ == "__main__":
app.run(sdf)
Explain how to use aggregate fn like for example standard deviation:
import os
from quixstreams import Application, State
import uuid
from dotenv import load_dotenv
from datetime import datetime
import statistics
load_dotenv("./.env")
app = Application.Quix(str(uuid.uuid4()), auto_offset_reset="earliest")
input_topic = app.topic(os.environ["input"], value_deserializer='json')
#output_topic = app.topic(os.environ["output"], value_serializer='json')
sdf = app.dataframe(input_topic)
sdf = sdf[sdf.contains("location-speed")]
sdf = sdf[["Timestamp", "location-speed"]]
def reduce_mean(state: dict, row:dict):
state["speed_values"].append(row["location-speed"])
return state
def init_mean(row: dict):
return {
"speed_values": [row["location-speed"]],
}
sdf = sdf.tumbling_window(60000).reduce(reduce_mean, init_mean).final()
sdf = sdf.apply(lambda row: {
"timestamp": row["start"],
"mean_speed": statistics.stdev(row["speed_values"])
})
sdf = sdf.update(lambda row: print((row)))
#sdf = sdf.to_topic(output_topic)
if __name__ == "__main__":
app.run(sdf)
Merlin commented
Please use an end-to-end use case scenario to explain this concept. I.e. "suppose that you have a sensor that produces, temperature every X and you want to Y", here's how you would do it.
Daniil Gusev commented
Hey @tomas-quix , the current docs already have an example for reduce()
here
Do you want to replace it with the provided one?
Daniil Gusev commented
Hey @tomas-quix
The new docs are released in #315
I'm going to close this issue.
If you think they still need more examples, please reopen it.