Performance enhancements

Question

Performance enhancements

praveenrewar opened this issue 2 years ago · comments

Describe the problem/challenge you have
We rely on list api calls to get information from the cluster which could put some burden on the cluster if the number of objects returned is high. When the number of apps being deployed using kapp increases (in cases of kapp-controller packages), this becomes a problem as the time taken to deploy the apps increases after a certain point without any burden on the cpu or memory of the cluster nodes.

Throttling warnings when there are multiple kapp apps being used at the same time.
socket: too many open files when ulimit is set to a low number (256)

*Describe the solution you'd like
We need to minimise the list calls as much as possible (Replacing them with get or watch is also an option).

Tasks

Anything else you would like to add:
It might be worth understanding the API priority and fairness.

Vote on this request

This is an invitation to the community to vote on issues, to help us prioritize our backlog. Use the "smiley face" up to the right of this comment to vote.

👍 "I would like to see this addressed as soon as possible"
👎 "There are other more important things to focus on right now"

We are also happy to receive and review Pull Requests if you want to help working on this issue.

Soumik Majumder · Answer 1 · Tue Sep 06 2022 13:19:04 GMT+0800 (China Standard Time)

Do we have reason to believe that it is the list call adding to the burden rather than get calls in the wait stage?
I believe those would be higher in number.

Soumik Majumder · Answer 2 · Tue Sep 06 2022 13:20:56 GMT+0800 (China Standard Time)

Not sure if it will be helpful, but this KEP elaborates on the thought process and goals of API fairness and priority in detail.

Evan Anderson · Answer 3 · Fri Sep 09 2022 03:38:17 GMT+0800 (China Standard Time)

In particular, both list and get calls will be counted against the API fairness and priority budget in a way that watch calls are not (there's a separate budget for those, but the assumption is that they are long-running and the cost of the initial population is amortized over the duration of the watch, possibly in conjunction with the golang informer cache).

Evan Anderson · Answer 4 · Thu Jan 19 2023 22:44:49 GMT+0800 (China Standard Time)

Is the one change listed here for the initial list the only performance change needed?

Do you need help setting up a test environment?

Praveen Rewar · Answer 5 · Fri Jan 20 2023 00:27:16 GMT+0800 (China Standard Time)

Hi @evankanderson I didn't mean to close it, but it got closed along with the PR, we are still working (although we are not able to spend much cycles) on some of the items from the list. Thank you so much for the help :)

github-actions · Answer 6 · Mon Mar 13 2023 08:14:46 GMT+0800 (China Standard Time)

This issue is being marked as stale due to a long period of inactivity and will be closed in 5 days if there is no response.