flock-lab / flock

Flock: A Low-Cost Streaming Query Engine on FaaS Platforms

Home Page:https://flock-lab.github.io/flock/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error Handling for Async Function Invocation

gangliao opened this issue · comments

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
(This section helps Flock developers understand the context and why for this feature, in addition to the what)

scatter-gather model shows that we can aggregate data packets received from different previous function instances if aggregate function's concurrency == 1. If the function doesn't have enough capacity to handle all incoming requests, events might wait in the event queue.

Even if your function doesn't return an error, it's possible for it to receive the same event from Lambda multiple times because the queue itself is eventually consistent. In Flock, we use a bitmap in the aggregate function to ensure that data is reassembled and processed exactly once.

However, If the function can't keep up with incoming events, events might also be deleted from the queue without being sent to the function. Or, when the queue is very long, new events might age out before Lambda has a chance to send them to your function. In rare cases, we may lose data packets or events.

Describe the solution you'd like
A clear and concise description of what you want to happen.

We can configure the aggregate function with a dead-letter queue to save discarded events for further processing. A dead-letter queue is used when an event fails all processing attempts or expires without being processed. A dead-letter queue is part of a function's version-specific configuration, so it is locked in when you publish a version.

We can configure an Amazon SQS queue or Amazon SNS topic as a dead-letter queue for discarded events. For dead-letter queues, Lambda only sends the content of the event, without details about the response. To reprocess events in a dead-letter queue, we can set it as an event source for our Lambda function. Or, we can manually retrieve the events from the aggregate function.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.