cloudfoundry / diego-release

BOSH Release for Diego

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BBS PR REVIEW]: PoC for Security group filtering

Keyli0Iliev opened this issue · comments

Add Security group filtering to DesiredLRP

Summary

Proposing to modify the DesiredLRPFilter and DesiredLRPs response to route-emitter to exclude Security group rules, and to modify the route emitter to only request the lighter version of the DesiredLRPs

Detailed description

In cases where there are a huge amount of security groups in a single space and multiple apps deployed in that space we notice that the network traffic consistently increases as more apps are added to a space.

After investigation we found that it was due to the large amount of security groups being sent by the bbs to components that do not need them.

Bellow are screenshots of Average network activity from AWS before and after the proposed fix

As you can see in the screenshots the network traffic grew by 500% over the course of the test from average of 15 MB to average of 90 MB over the course of 8 hours.
While filtering the security groups the increase was only 13% over the same time period.

Without security group filtering

testwithout

With security group filtering

testwithpatchnew

To reproduce

Create 3000 security group rules and apply them to a space
Deploy 50 applications with 100 routes each In that space
Monitor IaaS network traffic from the Diego-api and network traffic to the Diego-cells

Diego repository PR's

bbs link
route-emitter link

Additional Text Output, Screenshots, or contextual information (optional)

None.

[DoNotMerge]
[PoC]

Thank you for this data! We've been thinking about going down the path of removing ASG data from diego completely, but didn't have enough justification that it would be worth the effort.

Instead of this, how would y'all feel about modifying the garden-external-networker to pull ASG data from policy-server, rather than accepting it from diego, and then start modifying diego components to not make use of ASG data? Eventually CAPI could be told to stop sending it to diego in the first place.

It would however result in a network new dependency + increased latency for container creation due to the lookup. (this dependency already exists with dynamic asgs' silk-cni -> vxlan-policy-agent -> policy-server). It's also a bit more work than what you've proposed, and no one has started it, compared to your PoC.

On the other hand, I get nervous when deleting info from the protobuf spec, and trying to ensure forwards/backwards compatibility with other diego releases during upgrade or mixed-environment scenarios.

Hi @geofffranks,
It is good that you also find the egress rules should not be sent around like this.

What you are proposing, relies on the fact that ASG syncing is enabled for the Policy Server. While we are slowly moving into enabling this, this is not yet a fact. And even afterwards, the path for ASGs to become iptables rules from CC -> policy-server -> vxlan-policy-agent -> silk-cni, is quite long. Having the backup path via Diego as something tested and working feels quite safe. This way, we know that whatever happens, we are just a quick config change away from reverting to a functionality that ensures ASGs work 100% (disabling dASG).

I fear the dASG implementation has not vested long enough, so that we can assume that some components (policy-server) would become a prerequisite for a working CF.

Also as per our investigation, we were not quite sure if there aren't some usecases of those egress rules as they are sent via events, and adding a backwards incompatible change, or a new version of the API and migrators, did not seem reasonable or pragmatic

On the other hand, the change that we propose is fixing a real issue, that blocked one of our productive system last year.
Of course it can also be regarded as an effort to get rid of using egress rules in one part of diego.

And while it is interesting to get rid of all usages, currently they are not causing us harm. Compared to other parts of diego that still are not robust enough and keep causing outages and thus - are higher on our priority list for fixing

@vlast3k That's fair. I'll see if we can get someone to do an official review on the PRs linked above.

Hi @geofffranks,
Please have in mind that this is just a PoC but not something prod ready. We Implemented it just to have it as a proposal and starting point for discussion. If we have the agreement that this is the right way to go with this issue, we shall implement required test, etc. and prepare a new PRs.

👍I'm on board with this, and it seems like a reasonable approach. Maybe add this to the Feb WG meeting for a broader consensus?

Hi @mariash,

Let's move the communication related to the proposed bbs PoC PR, here, so that it is visible to everyone.

In general i agree that having separated api call is a good idea. I also agree that the SkipEgressRules actually is not a filtering parameter. The problem that i see with having a separated/new call is the fact that the EgressRules are stored in the run_info column of the desired_lrps table. The run_info column itself represents rather a complex type models.DesiredLRPRunInfo than a simple value, hence we can't execute precise SQL queries :( . Alongside with this the run_info column is encrypted and it takes a lot of CPU resources every time when it needs to be encrypted/decrypted. (BTW: this is another issue we spotted recently but better think about it later). Basically, from my point of view the desired_lrps table doesn't comply to the 1NF which is an issue for executing precise SQL queries. I doubt that without DB migration we shall not be able to extract data without EgressRules on SQL level. This is the reason why we found and propose the filtering option. Of course it has the disadvantage that the EgressRules data is already read/decrypted/stored in the memory before nil -ing it.
According to the filtering i see another option if we would like to keep the filtering API as it is, simply to introduce a new optional parameter to the bbs client DesiredLRPs function, could be either simple or complex type which contains the SkipEgressRules parameter.

Hi @PlamenDoychev,

I think we should separate the API schema from how it is presented in database. What is in database is an internal implementation. Basically we want to pull some data that route emitter wants. We have an endpoint that returns more data than we need. The most straightforward way to express this in API would be an object presenter which only presents certain parts of the object. We already have similar endpoint for desired LRPs scheduling info - /v1/desired_lrp_scheduling_infos/list and I actually wonder if we can use it?

  1. I don't see that route emitter pulls anything from run_info, it looks like it only needs sched_info. It gets things like domain, desired routes from sched_info. Then it gets the address from corresponding actual LRP. If this is true we can maybe get away with using /v1/desired_lrp_scheduling_infos/list endpoint in route-emitter?

  2. If route-emitter does need something from run_info I would still argue to have a separate endpoint /v1/desired_lrp_routing_infos/list instead of overloading existing endpoint for desired lrps and returning partial objects. It can still use the same database request internally. The API response will contain objects DesiredLrpRoutingInfos similar to how we have DesiredLrpSchedulingInfos. This will only contain fields needed by route-emitter and will be populated from DesiredLrps pulled from database. For extra points we can have a separate sqldb method that skips deserializing Egress rules. But this can be done as a separate effort. We can now focus on traffic optimization and later on BBS CPU consumption.

Hello @mariash,

Unfortunately we have exactly one place where a run_info field is used in the route-emitter and it's in the code that you mentioned and because of that /v1/desired_lrp_scheduling_infos/list is not an option.
Once you get in SetRoutes there is a routeGenerator function getting called. If you follow the definitions you get that http requests that using the routeGenerator use this function called httpRoutesFrom which use the MetricTags which is in run_info

Your idea for an additional endpoint that only has the required info the route-emitter needs makes sence to me, our idea would be to create a dedicated presenter object that would represent the contract between bbs and route-emitter.
This way we will solve the bandwith problem but the high CPU usage issue will remain as we will still need to decript run_info for MetricTags however it would be good step in improving the bbs in the future.

Should we proceed in taking this cource of action, and make a new implementation for the endpoint?

Kind regards,
@Keyli0Iliev

Hi @mariash,

Do you have any additional comments or we have agreement and we can start working on the issue? :)

Regards,
Plamen Doychev

@PlamenDoychev @Keyli0Iliev have these PRs been tested in a forward/backward compatibility scenario?

For example, BBS with #66 against route-emitter without #23, and route-emitter with #23 against BBS without #66

@geofffranks If 66 is present, the route emitter could use both endpoints, but without the new endpoint, this change won't work. During an update, if we first redeploy the BBS instances, everything will be fine, but if the cells are redeployed first, then we could have an issue. Maybe we should introduce the change in the BBS first and the one in the route emitter in a separate release, or make it fully backwards compatible.

I think we'll need to make it fully backwards compatible to handle cases where people use additional deployments of cells for windows or isolation segment support and managed with different releases from BBS on the core CF deployment.

@geofffranks Updated route emitter with a fallback to the old endpoint. This way we first try to get the desiredLRPs from the routing_info endpoint, if we get route not found error, we use the old one. This should make sure that we are backwards compatible.

I ran uptimer against a CF deployment through the following scenarios with uptimer:

Initial state: BBS + Route-emitter both without the proposed changes

Deploy BBS with the change + route-emitter without the change
sleep 60
Deploy BBS with the change + route-emitter with the change
sleep 60
Deploy BBS without the change + route emitter with the change
sleep 60
Deploy BBS without the change + route emitter without the change
sleep 60

And got these results:

{
  "summaries": [
    {
      "name": "HTTP availability",
      "failed": 0,
      "summaryPhrase": "perform get requests",
      "allowedFailures": 5,
      "total": 1476
    },
    {
      "name": "App pushability",
      "failed": 2,
      "summaryPhrase": "push and delete an app",
      "allowedFailures": 2,
      "total": 17
    },
    {
      "name": "Recent logs",
      "failed": 0,
      "summaryPhrase": "fetch recent logs",
      "allowedFailures": 2,
      "total": 148
    },
    {
      "name": "Streaming logs",
      "failed": 0,
      "summaryPhrase": "stream logs",
      "allowedFailures": 2,
      "total": 49
    }
  ],
  "commandExitCode": 0
}

I think we're good on backwards/forwards compatibility.

Hello @mariash and @geofffranks,

Here is a follow up PR for the routing_info endpoint in the route emitter. With this we change the endpoint used when we need to fetch all the desiredLRP's when there is a missing actualLRP. This should again bring some improvements on the network usage.

@Keyli0Iliev @klapkov looks like everything here is merged + released. Are we ok to close this now?