A dedicated blog about how to write a scheduler plugin

Question

A dedicated blog about how to write a scheduler plugin

kerthcet opened this issue a year ago · comments

Describe the issue

Following the discussion on https://kubernetes.slack.com/archives/C09TP78DV/p1701098974893019 , we do have several posts about the scheduler framework and how scheduler works, however, we still lack a article about how to write a decent scheduler plugin, like how preemption works, and how to activate a group of pods after scheduling cycle.

This will be benefit to the plugin developers in the long term, also the scheduler managers, no need to explain again and again and again.

/sig scheduling
/assign

Kante Yin · Answer 1 · Wed Dec 13 2023 10:21:59 GMT+0800 (China Standard Time)

/kind documentation

Kensei Nakada · Answer 2 · Wed Dec 13 2023 16:12:38 GMT+0800 (China Standard Time)

I wouldn’t object to having a more user-friendly doc for plugins developers.
But, I’m not sure if a website or blog is the most suitable to describe how the plugins could be written. I definitely much prefer to enrich comments in the implementation.
Basically my take is kubernetes/website#43686 (comment):

Not only QueueingHint, but the scheduling framework in general is on the border between internal and user-facing feature. It’s a very fundamental design and it could be important not only for custom-scheduler developers but also for everyone to understand how K8s scheduling works in general. So, we have a doc explaining an overview of the scheduling framework and each interface: https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/
But, OTOH, we don’t explain how to actually implement each interface in detail (e.g., Filter can return these statuses, etc), while the actual interfaces have comments explaining such detail (so, does QueueingHint). This is how the scheduling framework is explained currently, and IMO it’s a good balance - If we explain everything about each interface in K8s doc, it’d be too huge to maintain and could be rotten very easily. Comments in source code could be easily maintained with the implementation.

So, having a blog or an article is OK, but should be written so that they won't be too specific and a maintenance burden.

Kante Yin · Answer 3 · Thu Dec 14 2023 11:56:47 GMT+0800 (China Standard Time)

This will under the maintenance of contributors/devel/sig-scheduling as a devel doc.
Everything owns the maintenance burden, the comments, the docs, the codes of course. But comments should more about what and why with precise descriptions, short summary, however, how to walk through all the details, the best practices, that's say how, we need a doc, my two cents.

Kensei Nakada · Answer 4 · Thu Dec 14 2023 14:39:03 GMT+0800 (China Standard Time)

Once we write something specific, then I could easily imagine those docs will soon be stale and unusable. It'd be very tough to keep checking the need of doc updates every time we merge something.
On the other hand, if we write something near the implementation, (of course, we still have to keep updating the comments though) it'd be very easy to keep it up-to-date. It's the basic reason why golang has go-doc feature. We keep the doc as comments near the implementation, and then the comments will be the documentation.
But, again, I do not object to writing the doc somewhere, nor having an article as a starting point of the development of plugins, if it is not too specific.

Kubernetes Triage Robot · Answer 5 · Wed Mar 13 2024 14:40:30 GMT+0800 (China Standard Time)

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Kubernetes Triage Robot · Answer 6 · Fri Apr 12 2024 14:58:10 GMT+0800 (China Standard Time)

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Kante Yin · Answer 7 · Fri Apr 12 2024 15:39:40 GMT+0800 (China Standard Time)

/remove-lifecycle rotten

Still under plan.

Adeel · Answer 8 · Wed Apr 24 2024 22:47:37 GMT+0800 (China Standard Time)

@kerthcet please can you share a good resource step by step guide for the plugin development in the kubernetes scheduler. I am facing problem in the development and deployment

Kante Yin · Answer 9 · Thu Apr 25 2024 11:54:50 GMT+0800 (China Standard Time)

You can connect me via slack, maybe I can help you, but right now, I can't guarantee that I have time to finish this, I may take a holiday for this.

Vanessasaurus · Answer 10 · Thu Apr 25 2024 13:56:26 GMT+0800 (China Standard Time)

We have a pretty good design going, but won't have something good to share until the latest PR is merged (with quite a lot of changes). I'm following this issue so I can come back if/when we do.

Ping @cmisale and @milroy, would be good to get our recent work merged so we can share the design and automation approach.

Kubernetes Triage Robot · Answer 11 · Wed Jul 24 2024 14:19:32 GMT+0800 (China Standard Time)

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Kante Yin · Answer 12 · Mon Aug 05 2024 16:00:23 GMT+0800 (China Standard Time)

/remove-lifecycle stale