odigos-io / odigos

Distributed tracing without code changes. 🚀 Instantly monitor any application using OpenTelemetry and eBPF

Home Page:https://odigos.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add Affinity and Tolerations to odiglet and odigos-data-collection Daemonset

clavinjune opened this issue · comments

Is your feature request related to a problem? Please describe.
We need to be able to set Affinity and Tolerations attributes to odiglet and odigos-data-collection

Describe the solution you'd like
Need advice on this one

Describe alternatives you've considered

Additional context

Hi @clavinjune ,

Thanks for reaching out. Can you please share more about what you are trying to achieve and what is your motivation?

hi @blumamir, in our organization we use Kubernetes affinity and tolerations in order to manage separated workload, in order to be able to deploy odiglet and odigos data collection in all of our nodes, we need to configure the affinity & tolerations

I hope that make sense

Thanks @clavinjune

What values would like to add there? Is it something specific to your organization that needs to be injected via config?

Hi @blumamir, it's not specific to my organization, it is a common configuraition of kubernetes, Affinity and Tolerations. It would help user to manage their workload since odiglet and odigos-data-collection is a Daemonset, they can choose on which node they want/not want to deploy the daemonset

If odiglet and data-collection will not be deployed on some nodes, odigos will be unable to auto-instrumented the pods that are running on this node.

How can we guarantee that we don't have pods we need to instrument on these nodes which are filtered out?

If odiglet and data-collection will not be deployed on some nodes, odigos will be unable to auto-instrumented the pods that are running on this node.

that's the point actually, in our case, we have some nodes that have some taints configuration which make odigos' daemonset can't be deployed. We need to have configured the Tolerations in order to deploy odigo's daemonset on that nodes.

Can you share an example of what values you would like to configure for these settings?
Is it something that we can add to the odigos install cli command as flags?

it would be similar to the code below, but we should give user more flexibility to set the value

https://github.com/keyval-dev/odigos-charts/blob/41ff0b0d440d49892bfed0281feeb83ce0ccbbc5/charts/odigos/templates/odiglet/daemonset.yaml#L55-L59

Is it something that we can add to the odigos install cli command as flags?

I'm still open for advise on this one, maybe we can create a configuration file that can be read by odigos install command?

it would be similar to the code below, but we should give user more flexibility to set the value

https://github.com/keyval-dev/odigos-charts/blob/41ff0b0d440d49892bfed0281feeb83ce0ccbbc5/charts/odigos/templates/odiglet/daemonset.yaml#L55-L59

Is it something that we can add to the odigos install cli command as flags?

I'm still open for advise on this one, maybe we can create a configuration file that can be read by odigos install command?

What will be the config values you need in your organization for this setting?
Is it a big list of complex statements or do you need a specific rule just to filter out a problematic case?

the config values I need is only something like array of toleration object like below

tolerations:
- effect: NoSchedule
  operator: Exists

but, usually in a helm chart, these 3 attributes are configurable in order to solve this kind of problem

nodeSelector: {}
tolerations: []
affinity: {}

https://github.com/prometheus-community/helm-charts/blob/5bd2f828f67aea8861f31433edb9c5c503fe5fcc/charts/prometheus-redis-exporter/values.yaml#L47-L51

We have odigos helm charts in this repo maintained by https://github.com/esara

While we recommend users to use the cli, the helm chart might be a good option for this need. Would you be interested in contributing a PR to add these options to odigos helm chart?
The odiglet manifest is generated directly by the helm chart, but the data-collection manifest is generated by the auto-scaler controller, which would also need to be aware of this configuration.

Let me know if you think this is something that can work for you. Feel free to ping me in slack for any question or thought.

I'll take a look on that one @blumamir! Meanwhile, can we keep this issue open?

I'll take a look on that one @blumamir! Meanwhile, can we keep this issue open?

Yes sure, let me know how it goes

just created the issue & PR, please help to assign this issue to me, thank you 🙇

for the data-collection daemonset, should we try to fetch the odiglet nodeSelector, tolerations, and affinity configuration, then apply it into the data-collection daemonset? Since both configuration will be most likely the same

wdyt @blumamir?

Thanks again @clavinjune I merged the Pr for the helm chart.
Please let me know if it solved your issue and if you need further assistance with anything

yes thank you for the support, it'll solve some part of the issue. But we still need to configure the data-collection daemonset, any advise for this @blumamir ?

I think we can simply copy these values from the odiglet daemonset manifest.

Would you like to contribute this feature? I can help if you need more info or guidance

sure, I could work on this, but I'm new to operator-sdk so I might need guidance on this feature.

Also, do we need to be able to configure odiglet from the CLI?

sure, I could work on this, but I'm new to operator-sdk so I might need guidance on this feature.

I can help you. Please write here or in odigos slack for any issue.

Also, do we need to be able to configure odiglet from the CLI?

I think we can have this advance feature only configurable from the chart, and have the CLI for the mainstream installations where users are ok with the basic affinity and tolerations.
WDYT?

I think we can have this advance feature only configurable from the chart, and have the CLI for the mainstream installations where users are ok with the basic affinity and tolerations.

sure, we'll go with this first then.

please confirm this step, the operator will try to fetch & listen on every change on odiglet affinity & tolerations, if there's any changes, then we need to update the data-collection daemonset then roll them out, am I correct?

I think we can have this advance feature only configurable from the chart, and have the CLI for the mainstream installations where users are ok with the basic affinity and tolerations.

sure, we'll go with this first then.

please confirm this step, the operator will try to fetch & listen on every change on odiglet affinity & tolerations, if there's any changes, then we need to update the data-collection daemonset then roll them out, am I correct?

Yes that sounds right. Since we don't expect the affinity & tolerations of odiglet values to change after installations, we can choose to simplify a bit and only read the values in syncDaemonSet (but the "right" way to do that would be to add reconciller on the odiglet daemonset in case it is changed on the fly).

when you work on the auto-scaler code, you first need to disable the auto-scaler deployment that is running in your cluster with:

kubectl scale deployment odigos-autoscaler --replicas=0 -n odigos-system

Then, you can run the auto scaler from code or from debugger:

cd autoscaler
go run .

This way you can develop and test your code instantly.

At this point in the code you have the c client.Client variable which you can use to c.Get(...) the odiglet daemonset manifest, extract the relevant fields and send them to getDesiredDaemonSet to be embedded into the data-collection manifest

sure, thank you @blumamir. Lemme take a look and try to raise the PR ASAP

but the "right" way to do that would be to add reconciller on the odiglet daemonset in case it is changed on the fly

can we create a reconciler for odiglet daemonset? aren't we only able to create a reconciler for CRDs @blumamir ?