Add Affinity and Tolerations to odiglet and odigos-data-collection Daemonset
clavinjune opened this issue · comments
Is your feature request related to a problem? Please describe.
We need to be able to set Affinity and Tolerations attributes to odiglet and odigos-data-collection
Describe the solution you'd like
Need advice on this one
Describe alternatives you've considered
Additional context
Hi @clavinjune ,
Thanks for reaching out. Can you please share more about what you are trying to achieve and what is your motivation?
hi @blumamir, in our organization we use Kubernetes affinity and tolerations in order to manage separated workload, in order to be able to deploy odiglet and odigos data collection in all of our nodes, we need to configure the affinity & tolerations
I hope that make sense
I guess we can try adding Affinity and Tolerations configuration on code below
https://github.com/keyval-dev/odigos/blob/206ef74eccff945a582571186365d2410d76347d/autoscaler/controllers/datacollection/daemonset.go#L98-L100
Thanks @clavinjune
What values would like to add there? Is it something specific to your organization that needs to be injected via config?
Hi @blumamir, it's not specific to my organization, it is a common configuraition of kubernetes, Affinity and Tolerations. It would help user to manage their workload since odiglet and odigos-data-collection is a Daemonset, they can choose on which node they want/not want to deploy the daemonset
If odiglet and data-collection will not be deployed on some nodes, odigos will be unable to auto-instrumented the pods that are running on this node.
How can we guarantee that we don't have pods we need to instrument on these nodes which are filtered out?
If odiglet and data-collection will not be deployed on some nodes, odigos will be unable to auto-instrumented the pods that are running on this node.
that's the point actually, in our case, we have some nodes that have some taints configuration which make odigos' daemonset can't be deployed. We need to have configured the Tolerations
in order to deploy odigo's daemonset on that nodes.
Can you share an example of what values you would like to configure for these settings?
Is it something that we can add to the odigos install
cli command as flags?
it would be similar to the code below, but we should give user more flexibility to set the value
Is it something that we can add to the odigos install cli command as flags?
I'm still open for advise on this one, maybe we can create a configuration file that can be read by odigos install
command?
it would be similar to the code below, but we should give user more flexibility to set the value
Is it something that we can add to the odigos install cli command as flags?
I'm still open for advise on this one, maybe we can create a configuration file that can be read by
odigos install
command?
What will be the config values you need in your organization for this setting?
Is it a big list of complex statements or do you need a specific rule just to filter out a problematic case?
the config values I need is only something like array of toleration object like below
tolerations:
- effect: NoSchedule
operator: Exists
but, usually in a helm chart, these 3 attributes are configurable in order to solve this kind of problem
nodeSelector: {}
tolerations: []
affinity: {}
We have odigos helm charts in this repo maintained by https://github.com/esara
While we recommend users to use the cli, the helm chart might be a good option for this need. Would you be interested in contributing a PR to add these options to odigos helm chart?
The odiglet manifest is generated directly by the helm chart, but the data-collection manifest is generated by the auto-scaler controller, which would also need to be aware of this configuration.
Let me know if you think this is something that can work for you. Feel free to ping me in slack for any question or thought.
I'll take a look on that one @blumamir! Meanwhile, can we keep this issue open?
I'll take a look on that one @blumamir! Meanwhile, can we keep this issue open?
Yes sure, let me know how it goes
just created the issue & PR, please help to assign this issue to me, thank you 🙇
for the data-collection daemonset, should we try to fetch the odiglet nodeSelector
, tolerations
, and affinity
configuration, then apply it into the data-collection daemonset? Since both configuration will be most likely the same
wdyt @blumamir?
Thanks again @clavinjune I merged the Pr for the helm chart.
Please let me know if it solved your issue and if you need further assistance with anything
yes thank you for the support, it'll solve some part of the issue. But we still need to configure the data-collection daemonset
, any advise for this @blumamir ?
I think we can simply copy these values from the odiglet daemonset manifest.
Would you like to contribute this feature? I can help if you need more info or guidance
sure, I could work on this, but I'm new to operator-sdk so I might need guidance on this feature.
Also, do we need to be able to configure odiglet from the CLI?
sure, I could work on this, but I'm new to operator-sdk so I might need guidance on this feature.
I can help you. Please write here or in odigos slack for any issue.
Also, do we need to be able to configure odiglet from the CLI?
I think we can have this advance feature only configurable from the chart, and have the CLI for the mainstream installations where users are ok with the basic affinity and tolerations.
WDYT?
I think we can have this advance feature only configurable from the chart, and have the CLI for the mainstream installations where users are ok with the basic affinity and tolerations.
sure, we'll go with this first then.
please confirm this step, the operator will try to fetch & listen on every change on odiglet affinity & tolerations, if there's any changes, then we need to update the data-collection daemonset then roll them out, am I correct?
I think we can have this advance feature only configurable from the chart, and have the CLI for the mainstream installations where users are ok with the basic affinity and tolerations.
sure, we'll go with this first then.
please confirm this step, the operator will try to fetch & listen on every change on odiglet affinity & tolerations, if there's any changes, then we need to update the data-collection daemonset then roll them out, am I correct?
Yes that sounds right. Since we don't expect the affinity & tolerations of odiglet values to change after installations, we can choose to simplify a bit and only read the values in syncDaemonSet
(but the "right" way to do that would be to add reconciller on the odiglet daemonset in case it is changed on the fly).
when you work on the auto-scaler code, you first need to disable the auto-scaler deployment that is running in your cluster with:
kubectl scale deployment odigos-autoscaler --replicas=0 -n odigos-system
Then, you can run the auto scaler from code or from debugger:
cd autoscaler
go run .
This way you can develop and test your code instantly.
At this point in the code you have the c client.Client
variable which you can use to c.Get(...)
the odiglet daemonset manifest, extract the relevant fields and send them to getDesiredDaemonSet
to be embedded into the data-collection manifest
sure, thank you @blumamir. Lemme take a look and try to raise the PR ASAP
but the "right" way to do that would be to add reconciller on the odiglet daemonset in case it is changed on the fly
can we create a reconciler for odiglet daemonset? aren't we only able to create a reconciler for CRDs @blumamir ?