Shopify Inventory Manager
A simple proof of concept microservice for handling Shopify inventory management.
Design
The Happy Path
If fully implemented, the way this microservice would work is:
- A service sends a request to the Work Queue containing an inventory update
- The Coordinator dequeues messages and hold and release them to SNS at a rate that prevents exceeding the API limit, using elasticache to store throttle state
- The worker lambdas pick up the messages from the SNS notification and hit the shopify API
Error Handling
A dead letter queue in SQS connected to an Error Notifier in SNS take care of notifying services of errors.
- If the coordinator fails, the work queue will send error state to the DLQ
- If the lambdas fail they will send their error state to the dead letter queue
- The dead letter queue is connected to an SNS topic which will broadcast the error to all subscribers
This service explicitly does not handle its own errors, beyond a set number of retry attempts. Should a job fail repeatedly, the services are notified and it is up to them if they wish to keep trying or revert to their initial state and alert a client or another service.
Data Structures
Message Operations
Messages must be of a type defined in the Operations
interface, and contain a data
key.
Throttle State
API limits typically take the form of: "Limited to m
requests every n
[seconds/minutes/etc.]".
Throttle state contains just two values:
timerStart
: Timestamp, with millisecondssentQty
: Quantity of messages sent sincetimerStart
The way the throttle works is:
- The Coordinator dequeues a message
- If the current time is
n
seconds/etc. greater thantimerStart
:- Set
timerStart
to the current time - Clear
sentQty
and set to 0 - Send message to worker
- Set
- If the current time is not
n
seconds/etc greater thantimerStart
:- If
sentQty < m
:- Increment
sentQty
- Send message to worker
- Increment
- If
sentQty >= m
:- Hold the message until the current time is
n
seconds/etc. greater thantimerStart
- Set
timerStart
to the current time - Clear
sentQty
and set to 0 - Send message to worker
- Hold the message until the current time is
- If
Throttling Design Comparisons
Pros and Cons of Current Design
Pros:
- Maintains good order, with Coordinator sitting at the mouth of the queue
- Throttling can be configured to work exactly within API limits
- Behaviour is identical during bursty and calm periods
- Never waits longer than the API limit time length (minus one millisecond) to send a message
Cons:
- Adds latency to requests
- Adds some complexity to error handling, workers must send errors to the dead letter queue themselves
new prob with mine: Different ops take different amounts of time. Could still end up clustered
An Alternative Design: Distributed Locks in Elasticache
This design functions by and large the same as the first, except that there is no coordinator. When a lambda dequeues a message, it will check the Throttle State. If it needs to update it, it will acquire a lock on it, write to it, then release the lock.
Pros:
- Lower latency as the message goes directly from the queue to the worker
- Throttling can be configured to work exactly within API limits
Cons:
- During bursty traffic FIFO order will be lost, as all sleeping workers wake up to try to write Throttle State at once
- It is nondeterministic which worker will end up gaining the lock
- It is hypothetically possible for a worker to time out, although waiting 15 minutes is extremely unlikely
- Added complexity in ensuring the correctness of the distributed lock
What is Incomplete
The microservice is not done. The following components are missing or not functioning properly.
The Deadletter Queue
When errors are thrown, the error metadata is not be added by SQS or Lambda, such as the error name, stacktrace, etc.
Message are ending up in the dead letter queue inconsistently, and very slowly.
Not implemented:
- Throttle State
- KMS to hide API secrets
- Worker connection to dead letter queue (possible fifo queue limitation)