kubernetes / community

Kubernetes community content

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A dedicated blog about how to write a scheduler plugin

kerthcet opened this issue · comments

Describe the issue

Following the discussion on https://kubernetes.slack.com/archives/C09TP78DV/p1701098974893019 , we do have several posts about the scheduler framework and how scheduler works, however, we still lack a article about how to write a decent scheduler plugin, like how preemption works, and how to activate a group of pods after scheduling cycle.

This will be benefit to the plugin developers in the long term, also the scheduler managers, no need to explain again and again and again.

/sig scheduling
/assign

/kind documentation

I wouldn’t object to having a more user-friendly doc for plugins developers.
But, I’m not sure if a website or blog is the most suitable to describe how the plugins could be written. I definitely much prefer to enrich comments in the implementation.
Basically my take is kubernetes/website#43686 (comment):

Not only QueueingHint, but the scheduling framework in general is on the border between internal and user-facing feature. It’s a very fundamental design and it could be important not only for custom-scheduler developers but also for everyone to understand how K8s scheduling works in general. So, we have a doc explaining an overview of the scheduling framework and each interface: https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/
But, OTOH, we don’t explain how to actually implement each interface in detail (e.g., Filter can return these statuses, etc), while the actual interfaces have comments explaining such detail (so, does QueueingHint). This is how the scheduling framework is explained currently, and IMO it’s a good balance - If we explain everything about each interface in K8s doc, it’d be too huge to maintain and could be rotten very easily. Comments in source code could be easily maintained with the implementation.

So, having a blog or an article is OK, but should be written so that they won't be too specific and a maintenance burden.

This will under the maintenance of contributors/devel/sig-scheduling as a devel doc.
Everything owns the maintenance burden, the comments, the docs, the codes of course. But comments should more about what and why with precise descriptions, short summary, however, how to walk through all the details, the best practices, that's say how, we need a doc, my two cents.

Once we write something specific, then I could easily imagine those docs will soon be stale and unusable. It'd be very tough to keep checking the need of doc updates every time we merge something.
On the other hand, if we write something near the implementation, (of course, we still have to keep updating the comments though) it'd be very easy to keep it up-to-date. It's the basic reason why golang has go-doc feature. We keep the doc as comments near the implementation, and then the comments will be the documentation.
But, again, I do not object to writing the doc somewhere, nor having an article as a starting point of the development of plugins, if it is not too specific.

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

/remove-lifecycle rotten

Still under plan.

@kerthcet please can you share a good resource step by step guide for the plugin development in the kubernetes scheduler. I am facing problem in the development and deployment

You can connect me via slack, maybe I can help you, but right now, I can't guarantee that I have time to finish this, I may take a holiday for this.

We have a pretty good design going, but won't have something good to share until the latest PR is merged (with quite a lot of changes). I'm following this issue so I can come back if/when we do.

Ping @cmisale and @milroy, would be good to get our recent work merged so we can share the design and automation approach.

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

/remove-lifecycle stale