zalando-stups / senza

Deploy immutable application stacks and create and execute AWS CloudFormation templates in a sane way

Home Page:https://pypi.python.org/pypi/stups-senza

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Stacks may end up with weight of 0 in Route 53

musiKk opened this issue · comments

Closely related to #449.

We somehow got a stack with a weight of 0. This is no problem if this is the only stack but when deploying a new stack with senza create it will also get a weight of 0 which will cause both stacks to immediately get 50% traffic each which might not be desired. Additionally, any senza scale will lead to an immediate 100% traffic for the stack that's being scaled, even if specifying a lower percentage.

$ senza traffic my-service
Stack Name│Version│Identifier        │Weight%
my-service 1x16b11 my-service-1x16b11     0.0 
my-service 1x17b13 my-service-1x17b13     0.0 
$ senza traffic my-service 1x17b13 15
Calculating new weights.. OK
Stack Name│Version│Identifier         │Old Weight%│Delta│Compensation│New Weight%│Current
my-service 1x16b11 my-service-1x16b11         0.0                            0.0
my-service 1x17b13 my-service-1x17b13         0.0 100.0         85.0       100.0 <
Setting weights for my-service.my-team.my-domain... OK
$ senza traffic my-service
Stack Name│Version│Identifier        │Weight%
my-service 1x16b11 my-service-1x16b11     0.0 
my-service 1x17b13 my-service-1x17b13   100.0

I could understand this happening when there are two stacks with a 100/0% distribution and the stack with 100% is forcefully deleted but we definitely did NOT do this. We only delete stacks that don't receive any traffic.

As of yet I have no idea how we ended up in this situation. It is extremely undesirable.

Weights attached to senza stacks usually reflect percentage of traffic. The behaviour for this corner case can indeed be surprising. We are going to add a warning similar to https://github.com/zalando-stups/senza/blob/master/senza/traffic.py#L122 to notify people additionally to the compensation detail you got already as per your description.

When I'm already in the situation of having two stacks with a weight of 0 each, I find the resulting behavior quite understandable. I'm more concerned with getting into this situation in the first place which should be avoided at all costs. I'm looking forward to any improvement there. :)

By the way, when I speak of weight, I'm actually referring to the weight in Route 53 which only indirectly maps to the weight given in senza. I guess I should be more clear there in the future. So a DNS weight of 0/0 will lead to an actual distribution of 50%/50% and any subsequent scaling will change it to 200/0 and 100%/0% respectively. This creates a very rough and surprising scaling of 0->50->100%.

it will also get a weight of 0 which will cause both stacks to immediately get 50% traffic each which might not be desired.

This (2 stacks with the same weight will receive the same amount of requests no matter what it is) is a result of AWS Route 53 behavior we cannot control (unless we redo the entire routing approach in the STUPS ecosystem). Stacks are created with 0 weight since in a "normal" situation it will mean it will not get any traffic.

I also believe we can't change the behavior of increasing the traffic to also touch the other stack at 0% because until now people could always rely on the fact that stacks with 0% of traffic wouldn't get traffic by accident except on the specific corner case where everything has weight 0.

Do you have any suggestion to improve the behavior in a way that will not break deployment setups that are currently working?

Additionally, any senza scale will lead to an immediate 100% traffic for the stack that's being scaled, even if specifying a lower percentage.

One potential fix for this specific issue could be to threat the case where both/all stacks have 0 traffic weight as a special case and not filtering them out when redistributing the traffic weight. Would that solve this part of the issue for you?

One potential fix for this specific issue could be to threat the case where both/all stacks have 0 traffic weight as a special case and not filtering them out when redistributing the traffic weight. Would that solve this part of the issue for you?

In some sense the setting of weights from senza is just buggy. Say I want to give 10% to some stack, that stack has to get a weight of 20 and the rest (180) has to be distributed among all other stacks. Right now in the "all weights zero" scenario, the stack would get 200 and that's it.

Of course there is still one special case where the stack is the only stack available in which case it should of course receive the 200 regardless of what is provided on the command line (or reject anything but 100% with an appropriate error message).

I think my two paragraphs above translate into a "yes", but I'm not 100% sure. 😉 I hope it's clear what I mean.

This would solve the scaling but creation of new stacks would still be affected by the "existing stack has 0 weight and a new stack will also get a 0 so 50/50% is the result". I don't know how/if this can be fixed. Maybe check for other stacks during creation and fix weights as necessary? That way an existing 0 weight could be fixed to a 200/0 when adding a new stack or even a 0/0 could be fixed to a 100/100/0 and so on. This would also open the door to fix a, say 1/0 to a 200/0/0......

I think the low-hanging fruit here is to add a check in senza create to stop if all currently running stack versions have zero traffic.

It's still not clear how the bug behaves but recently we triggered it repeatedly after issuing a senza update to various stacks. The changes compared to the running stack were changing Minimum and Maximum of the AutoScaling object. In all cases there was only one version of the stack running so having traffic at zero only created some small issues. But it's unclear whether having multiple versions would have triggered the bug as well.

For all intents and purposes I deem the update command as unusable unless only a single version of a particular application is running.