kubernetes / enhancements

Enhancements tracking repo for Kubernetes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Kubernetes Metrics Overhaul

brancz opened this issue · comments

Enhancement Description

This is a cleanup so there are no stability milestones involved, however, to not break hard immediately, SIG Instrumentation is doing its best effort to inform about these changes in various ways as follows:

  1. Alpha release target 1.16
  • Stability framework is in place with metric verification/validation running
    in CI.
  • Metrics which are deprecated in the metrics overhaul are marked as deprecated,
    which can be overridden in a binary through a command line flag
  • No metrics can be marked as stable.
  1. Beta release target 1.17
  • All previously marked deprecated metrics will be removed from the codebase.
  • Metrics can be marked as stable.
  1. Stable release target 1.18
  • First release cycle in which stable metrics may be deprecated as per the new stability guidelines.

@logicalhan @serathius @piosz @ehashman

@brancz the graduation criteria in the KEP needs to be more detailed on what make it move between stages. The current is very vague.

@lachie83 @mrbobbytables @rbitia @mariantalla @evillgenius75

@kacole2 graduation criteria section is vague because this KEP is basically a collection of individual tasks that are graduated once completed.

As @brancz summarized in the first comment, everything has landed except for the full deprecation of the inconsistent labels, which is proposed for the 1.16 release, and removal of deprecated metrics targeted for 1.17. Once those are complete the KEP will be fully implemented. Does it need to be updated to say as much?

/milestone v1.16
/stage alpha

@brancz @ehashman is there a certain stage this should be labeled? I added alpha, but not sure if this is beta or moving to GA.

This is “just” a large scale cleanup that spans multiple releases, there’s not really any stability level around it. Not sure how to answer that question.

Talked to some folks, we're merging #1209 into this issue as the umbrella issue (they're actually already related as the current issue description says, we'll be updating the issue and comment here again).

This is somewhat of a personal opinion but I'd consider this "beta" with the plan to fully deprecate in 1.17 as "stable".

The alpha/beta/stable designation doesn't always align with the effort being done =/ For these I tend to think of it like this:
Alpha == work just beginning, long road ahead -- possibly several releases.
Beta == work in progress, most effort being done here and may span multiple cycles.
Stable == work wrapping up.

@brancz it looks like there is some disconnect over at #1209. Can we find a time to gather all relevant parties to get this sorted out?

@lachie83

Sorry for taking a bit. I edited the issue to reflect alpha/beta/stable timelines and tasks.

@brancz I am 1.16 Doc Lead. We need a placeholder PR against k/website(dev-1.16 branch) for this enhancement before Friday, Aug 23rd. Let me know how I can help to make this happen or if doc is not required.

I'm happy to jump on the doc PR if needed this week, where do the docs need to be updated @simplytunde? Do they just need to reflect the timeline/information included in the description of this issue?

@ehashman I do not have enough context on this to make decision on where/what docs needs to be updated. Lets bring it up on sig-instrumentation.

@ehashman @brancz code freeze for 1.16 is on Thursday 8/29. Looks like all PRs listed have been merged for Alpha. If there are any more that need to be tracked, please let me know!

@kacole2 we have one more PR coming for showing hidden metrics (as defined in the metrics stability KEP), per

Metrics which are deprecated in the metrics overhaul are marked as deprecated, which can be overridden in a binary through a command line flag

I'm working on that right now, should be able to have it up before code freeze. I think everything else is merged.

Edit: WIP PR link: kubernetes/kubernetes#81970

@brancz @ehashman @logicalhan
I'd like to join the task (metric overhaul, stability, validation, etc).

Kubernetes 1.17 will remove the in 1.14 marked as deprecated metrics. As a stretch goal, if the metrics stability framework is in place, then in Kubernetes 1.17 the metrics will only be turned off by default through the stability framework. Should this not be available, then the metrics will be removed.

I guess we can start this task after 1.16 release. Where can I find the list of deprecated metrics?

/assign

There won’t be removals as the framework components landed in 1.16 and flags are in progress. That means they’ll just be turned off by default for 1.17 and only truly removed in 1.18.

I would recommend to join the sig instrumentation slack channel and or sig meetings to get involved! :)

Yeah, I know the metrics stability framework is in place now.
Thanks @brancz , I will try to join sig meeting.

Hey there @brancz @ehashman -- 1.17 Enhancements lead here. I know it's still kind of fuzzy what each stage defines 😬 but I wanted to check in and see if you think this Enhancement will be graduating to alpha/beta/stable in 1.17?

The current release schedule is:

  • Monday, September 23 - Release Cycle Begins
  • Tuesday, October 15, EOD PST - Enhancements Freeze
  • Thursday, November 14, EOD PST - Code Freeze
  • Tuesday, November 19 - Docs must be completed and reviewed
  • Monday, December 9 - Kubernetes 1.17.0 Released

Thanks!

/milestone clear

Target continues to be as described in the original comment:

  • Beta release target 1.17
    • All previously marked deprecated metrics will be removed from the codebase.
    • Metrics can be marked as stable.

The concrete deliverable other than refinements to the framework are flags on each component enabling hidden metrics (as in those that were deprecated „3 releases ago and will be completely dropped in the next”).

Awesome, thank you for the quick response!

/milestone v1.17
/stage beta

Hello, @brancz I'm 1.17 docs lead.

Does this enhancement (or the work planned for v1.17) require any new docs (or modifications to existing docs)? If not, can you please update the 1.17 Enhancement Tracker Sheet (or let me know and I'll do so)

If so, just a friendly reminder we're looking for a PR against k/website (branch dev-1.17) due by Friday, November 8th, it can just be a placeholder PR at this time. Let me know if you have any questions!

No new docs necessary to my knowledge, but just want @logicalhan and @RainbowMango to confirm.

I think we should update docs as we will add new flags for kube-binary.
The first PR as kubernetes/kubernetes#84292 . Does this in the scope of 1.17?

I think we should update docs as we will add new flags for kube-binary.
The first PR as kubernetes/kubernetes#84292 . Does this in the scope of 1.17?

@daminisatya
Yes, I think we should modify docs for the new flags.
As we want to deprecated metrics in v1.17, we need to provide the flags (as an escape mechanism).
I will present the PRs before Friday, November 8th, and let you know.

Awesome! Thank you @RainbowMango I will update the tracking sheet as well

Hello @RainbowMango

Just a friendly reminder, We're hoping to have a placeholder Docs PR against k/website (branch dev-1.17) by Friday, Nov 8th. (4 more days left)

@daminisatya

Thanks for the reminder. We re-discussed the scope of 1.17, and we haven't made the final decision.
Once we make the decision, I will get back to you and present a draft Docs PR.

The deadline for docs update is Tuesday, Nov 19th, right?

The deadline for first Docs update is Nov 8th (We are just looking for a placeholder PR for this deadline) @RainbowMango

Hey there @RainbowMango, 1.17 Enhancement lead here 👋 I don't want to dog-pile you with the notifications for deadlines, 🙈 but code freeze is also fast approaching (November 14th). It looks like you're descoping a bit -- when you're done discussing what should be included, would you mind mentioning which PRs should be tracked? Only release-blocking issues and PRs will be allowed in the milestone after the freeze 😬

Thanks!

@mrbobbytables

when you're done discussing what should be included, would you mind mentioning which PRs should be tracked?

Do you mean I should tag the PRs with the label milestone v1.17?

Only release-blocking issues and PRs will be allowed in the milestone after the freeze

I'm not sure what kind of issues/PR belongs to release-blocking. How about PR with label milestone v1.17?

Thanks!

Do you mean I should tag the PRs with the label milestone v1.17?

If you could link them here in the issue that would be great 👍 right now I'm going by the open PRs that reference this issue.

I'm not sure what kind of issues/PR belongs to release-blocking. How about PR with label milestone v1.17?

Release blocking jobs are usually only fixes that impact the stability the release, resolves a performance regression etc. For a good definition of release blocking, check out this section from the docs in the sig-release repo:
https://github.com/kubernetes/sig-release/blob/master/release-blocking-jobs.md#release-blocking-criteria-and-dashboard

@RainbowMango Do you happen to have an update on which PRs should be tracked? There are still several that are not yet merged that are linking to this issue. 😬

With code freeze for the 1.17 release tomorrow (November 14th, 5pm PT) and a good chunk not yet merged I'm going to flag this as At Risk in the Enhancement Tracking sheet for now.

@mrbobbytables
I'd like put these PRs in v1.17 :

If they can not be merged tomorrow, can I cherry-pick them to branch release-1.17?

@RainbowMango unfortunately no. The only thing that's being allowed in after freeze are items that are release blocking.

@RainbowMango Code freeze is now in effect for the 1.17 release. It doesn't look like all the PRs were approved / added to a merge pool in time :(

I'm going to go ahead and remove this from the milestone, if you feel it is release blocking or urgent that it gets merged, please file an exception request.

/milestone v1.18

I think what's super weird is that we have this flag on some binaries now but not all.

@brancz agreed. It looks like kubernetes/kubernetes#83837 and kubernetes/kubernetes#83841 were the two that didn't make it in, and clayton said he'd approve once you lgtm'ed. I'd go ahead and file an exception request, and things should be able to get back on track 👍

@brancz agreed. It looks like kubernetes/kubernetes#83837 and kubernetes/kubernetes#83841 were the two that didn't make it in, and clayton said he'd approve once you lgtm'ed. I'd go ahead and file an exception request, and things should be able to get back on track 👍

Awesome, thanks!

I think what's super weird is that we have this flag on some binaries now but not all.

@brancz Do you also mean --show-hidden-metrics-for-version?
I fired an issue kubernetes/kubernetes#85270 and tracked there.
Since there is a bug, as mentioned in kubernetes/kubernetes#85402 (comment), I'm thinking if we should revert the flag in kube-apiserver for branch release-1.17, we can treat it better in v1.18.

@mrbobbytables
Should we cherry-pick kubernetes/kubernetes#83837 to branch release-1.17 now since it has been merged?

[edit]: I'm really sorry about I can't make it in v1.17 and failed to figure out the underlying bugs.

I'm really sorry about I can't make it in v1.17 and failed to figure out the underlying bugs.

No worries at all, this is software, things happen.

We have both tracked/no and tracked/yes labels on here, someone want to fix? 🤔 @mrbobbytables I think we wanna remove the latter?

@ehashman woops -- fixed, sorry about that. 🤦‍♂

Hey @RainbowMango,

1.18 enhancements team reaching out 👋 Are you planning on getting this into 1.18? It looks like there are a few PRs open. Code Freeze for 1.18 will be March 5th.

Are you planning on getting this into 1.18?

Yes. Strictly speaking, not all PRs linked to this issue are in the scope of 1.18.
I think we can close this issue after:

I will try my best to make it happen before 1.18 Code Freeze, thanks for the reminder.

We like to leave the issues open until they are "stable" and done, and the KEP is updated to implemented. I've got you tracked in the sheet and we'll check back around code freeze!

@RainbowMango @brancz could we please have the KEP for this updated with Test Plan info? It looks like we didn't do that in the 1.17 time frame and we should have. I'm going to remove this from the milestone for now, but you can file an exception request and we can add this back in. The KEP just needs to have test data added to it.

/milestone clear

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/remove-lifecycle stale

Hi @RainbowMango -- 1.19 Enhancements Lead here, I wanted to check in if you think this enhancement would graduate in 1.19?

In order to have this part of the release:

  1. The KEP PR must be merged in an implementable state
  2. The KEP must have test plans
  3. The KEP must have graduation criteria.

The current release schedule is:

  • Monday, April 13: Week 1 - Release cycle begins
  • Tuesday, May 19: Week 6 - Enhancements Freeze
  • Thursday, June 25: Week 11 - Code Freeze
  • Thursday, July 9: Week 14 - Docs must be completed and reviewed
  • Tuesday, August 4: Week 17 - Kubernetes v1.19.0 released
  • The KEP PR must be merged in an implementable state
  • The KEP must have test plans
  • The KEP must have graduation criteria.

@palnabarun Seems the enhancement need to supply test plans. What exactly is it? Is there any document about this?

Hi @RainbowMango, thank you for the update.

For the test plans, you can have a look at this KEP template for the exact requirement: https://raw.githubusercontent.com/kubernetes/enhancements/master/keps/NNNN-kep-template/README.md

Also, one quick question, which graduation stage would you be targeting in 1.19?

@RainbowMango -- pinging back as a reminder of the above. 🙂

Hi @RainbowMango,

Tomorrow, Tuesday May 19 EOD Pacific Time is Enhancements Freeze

Will this enhancement be part of the 1.19 release cycle?

Will this enhancement be part of the 1.19 release cycle?

The legacy changes of this KEP will not introduce a user-facing change, So, I guess you can ignore this KEP.

@RainbowMango -- Thanks for the update. I have updated the tracking sheet accordingly. 👍

/assign

This work was basically completed in the 1.17 release. I'll update the KEP as needed in order to close out this issue.