collectd / collectd

The system statistics collection daemon. Please send Pull Requests here!

Home Page:http://collectd.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Any release plans? 5.12.1 or 5.13.0 or 6.0.0 ?

hnyman opened this issue · comments

It is now over 3 years since the last release, 5.12.0.

Looking at the current branches, I am totally confused on how the development is going on and if there are any release plans?

From what I have followed collectd from "use for OpenWrt perspective" for over a decade, the development is currently somewhat stuck, and a "big change"(?) for 6.0 will likely take quite long ?

There are currently four branches that have seen commits this year:

  • main branch. Last commit 6 December 2023.
  • 6.0 branch: 341 commits ahead and 242 commits behind main. Last commit 6 December 2023.
  • 5.13 branch: 2 commits ahead and 240 commits behind main. Last commit 24 March 2023.
  • 5.12 branch: 6 commits ahead and 258 commits behind main. Last commit on 23 March 2023.

Four differing branches are a lot.
Especially when only 5.12 has been released and both 5.13 and 6.0 are unreleased release branches.

Differences between 5.12 and 5.13 are minimal, so the role of 5.13 branch is a puzzle.

Also, main and 6.0 have differed quite a lot. Why? Is main going to be 5.14?

Could there be a 5.12.1 maintenance release or a 5.13 release while waiting for 6.0?
Are both 5.12 and 5.13 really needed, if 5.13 is never going to be released?

Hi @hnyman,

that's an excellent question and I think we should have a community discussion about next steps and where we focus our energy.

  • collectd-5.12 branch

    Historically, we have committed bug fixes to "version branches" and merged them back into main to avoid cherry picking and duplicate commits. My suspicion is that recently bug fixes have simply been committed to main instead. My suggestion would be to package up a version 5.12.1 with the few fixes we have in there, since it's not much work. I wouldn't expect much more development to happen here.

  • collectd-5.13 branch

    Sounds like a release branch that got abandoned. I think we should rebase the few commits on there onto main (unless they are there already) and delete the collectd-5.13 branch.

  • main branch

    This is the current development branch for the collectd-5.* line. I think it contains numerous bug fixes and new features that we haven't released yet, and we should publish that as 5.13.0 at some point. I don't have a time line for that in mind at the moment. What do others think?

  • collectd-6.0 branch

    This is the development branch for the collectd-6.* line. I think it is a bit of a never ending story at this point, which I would love to change. We should discuss what has to happen before we can publish a "release candidate" and later a proper release. For example, I think that we should have the data structures locked in (e.g. #4187) and vet the metric names of "converted" plugins (e.g. #4181). It would be fine for me to publish a version 6 MVP that does not ship all of version 5's plugins, e.g. the Python plugin is probably very hard to migrate.

@mrunge What do you think?

IMHO:

  • collectd-5.12: at last CI fixes for latest distros should be included before .1 release, otherwise it's fairly useless
  • collectd-5.13: agree
  • main =>collectd-5.13: Earlier there were about 2 releases per year, but it's been >2 years since last release, so having new one rather sooner than later would be good to show that project is alive again... Here are few things that may be good to handle before that though:
  • collectd-6.0: main thing before release candidate, after structure stuff is locked down, would be syncing all 5.x changes to plugins to v6, as it's a long time since v6 was forked

Syncing all deviating changes is quite a tall order, given that collectd-6.0 is ~250 commits behind main. I could try to just cherry pick all non-merge commits and see how many of them apply cleanly. If that leaves <10 commits to manually merge, that should be feasible. But if it is more than that, I doubt we have the capacity to do this anytime soon and would prefer to just get out something, even if sub-optimal.

I would suggest to merge the main branch into the 5.13 branch. It was created earlier and then abandoned. I also agree with @eero-t that we should sync the ci fixes into the 5.12 branch to allow distributions to pick up the changes.

Tbh. I have no good overview at this time on the 6.0 status.

If collectd-5.13 has just two commits, I say we cherry-pick those into main and delete collectd-5.13. These merges over 200+ commits make the commit history incomprehensible.

What are your feelings about reinstating the collectd community call? It would give us a place where we can discuss these things and give status updates.

Syncing all deviating changes is quite a tall order, given that collectd-6.0 is ~250 commits behind main. I could try to just cherry pick all non-merge commits and see how many of them apply cleanly. If that leaves <10 commits to manually merge, that should be feasible.

Looking at individual commits is relevant only for plugins that have already been ported to 6.0 API. I think majority still has not been?

For ones that still have not been migrated to 6.0 API, their latest code in main can be just copied again to 6.0 branch. Preferably before applying any (several years old) "migrate to 6.0" PRs to them... :-)

(I think hnez did that for the plugins that he ported to 6.0 API.)

I looked into merging plugin changes from main to collectd-6.0. I looked at src/*.c as a proxy for plugins, though this list also contains unit tests, e.g. This is the summary of my results:

Status Count
No op; unchanged in main and collectd-6.0 105
No op; changes only in collectd-6.0 53
Diverged; changes in both main and collectd-6.0 30
Easily updatable; changes only in main 14

The plugins that can easily be updated are:

  • bind

  • collectd-nagios

  • cpufreq

  • hugepages

  • intel_pmu

    This plugin has seen significant work; it might be better to import commits rather than copying the file. // CC: @kwiatrox

  • mysql

  • nginx

  • ovs_stats

  • processes

  • smart_test

  • swap (may cause merge conflict with #4190)

  • ubi

  • virt

  • vmem

I have created a PR for these: #4198

The diverged plugins are:

Plugin Changes in main Changes in collectd-6.0 Status
amqp1 1 2 diverged
amqp 3 2 diverged
apple_sensors 1 1 diverged
battery 2 2 diverged
capabilities 1 2 diverged
csv 1 2 diverged
disk 2 6 diverged
gmond 2 3 diverged
intel_rdt 8 2 diverged
memory 3 7 diverged
mmc 3 4 diverged
modbus 1 1 diverged
mqtt 1 3 diverged
netlink_test 1 1 diverged
network 3 1 diverged
nut 2 1 diverged
perl 1 4 diverged
procevent 6 5 diverged
python 2 1 diverged
redfish 1 1 diverged
smart 2 1 diverged
snmp 4 2 diverged
statsd 1 2 diverged
write_graphite 1 3 diverged
write_http 2 8 diverged
write_influxdb_udp 5 6 diverged
write_log 1 6 diverged
write_mongodb 1 2 diverged
write_prometheus 2 5 diverged
write_stackdriver 3 4 diverged

I haven't looked into this yet, but I suspect that these will run into a fair amount of merge conflicts.

Ouch, I hadn't realized v6 was branched from main so long time ago, it has v6 API changes that are already 3.5 years old!

For the plugins with just 1 new commit in main, you could try scripting cherry-picking their HEAD commits to v6. If those apply, it would halve the list (I looked at couple of them, and they seemed ok).

write_prometheus is already migrated to v6 API, but the other commit is already in, and cherry-pick for HEAD one (951faba) one should apply fine.

Links to some of the patches distros are using on top of current 5.12 release:

@mrunge, maybe you can comment on whether any of the Fedora patches are still needed with CI fixes in main?

(I.e. will CI fixes be enough for 5.12.1 release, or is something else also missing.)

On quick look at the Debian patches, they are fixed in main except for ones changes config & include file paths, and the CGI one: #3840

I just merged the Perl CGI fix.

@octo Early this year Leonard checked and updated several plugins in v6 branch to latest changes from main, before finishing migrating subset of them to v6 APIs, see #4026 (comment).

I created the 6 MVP Milestone to track the things I think we need to get done before packaging our first collectd 6 release. Please take a look.

I started drafting a 6.0 release candidate at https://github.com/collectd/collectd/releases/edit/untagged-54315dd568937fe38d17

@octo I assume this is list of changes compared to latest (v5) release before it.

Of the significant v5->v6 changes that above list is missing, I can right away name at least:

(Last one is based on @manuelluis work, but @hnez did lot of checking before those old PRs were merged, so I think it's fair to attribute that to him: #4026 (comment))

As to plugin changes, while there have been quite a few PRs to Sysman plugin after I added it, IMHO those latter ones do not need to be explicitly listed in the changes, as plugin was not in any release.

PS. Of the significant fixes in v5 that should be added to v6 before any -rc release, I would propose at least your fix to missing riemann_set_event() terminators in riemann plugin. :-)

This is based on me grep'ing through the commit history to find PRs:

git log --pretty=oneline "main..collectd-6.0" | egrep 'Merge pull request #([1-9][0-9]*)'

The PRs you mentioned have been merged manually, probably because the automatic builders were broken.

Unfortunately I don't think we can get a list of PRs since the last release from the GitHub API. We can get a list of PRs using the collectd-6.0 base branch, so that should work for the initial collectd 6 release. Let me re-generate that.

PS. Of the significant fixes in v5 that should be added to v6 before any -rc release, I would propose at least your fix to missing riemann_set_event() terminators in riemann plugin. :-)

Cherry-picked that into collectd-6.0 as f603668

Unfortunately I don't think we can get a list of PRs since the last release from the GitHub API.

Merged ones with "[collectd 6]" in title:
https://github.com/collectd/collectd/pulls?q=is%3Apr+is%3Aclosed+is%3Amerged+%22%5Bcollectd+6%5D%22

https://github.com/octo/collectd/releases/tag/untagged-e88ba30fd34c7c8e5dce

Not found?

List content looks better to me now.

Please sort the list and split it at least to "Core" & "Plugin changes" sections, so that related changes are easier to find.

Note: if you add PR number after the component name (at the beginning) instead of at end, then the list will naturally sort both by the component, and the order in which changes to it were done.

Overview of the major user visible changes in the release would be nice at the start. Maybe something like this:

  • Metric names are changed to conform to OpenTelemetry standard: (link to Wiki doc)
  • Several plugins were refactored to use new "metric families" and labels, to support this. To build plugins that have not been ported over, new --enable-compatibility-mode option needs to be used
  • Write plugins (Prometheus...?) are changed to convert new metric & label names for their own formats: (link to Wiki doc)
  • Sysman (Intel GPU) input + OTEL output plugins were added, XXX plugins removed

=> It's better if Wiki metrics table lists in separate column(s) the new metrics names also for each write plugin which cannot output OTEL names as-is.

Good news everyone! All PRs in the "6.0 MVP" milestone have been merged! That means my next goal is to get collectd 6 into some users' hands.

I created new draft release notes. Please take a look.
(The URL appears to change with each update: if the link doesn't work anymore, navigate to https://github.com/collectd/collectd/releases and look for the release.)

The latest listed release is 5.12.0
I guess you need to tag it

Okay, I'll tag it once #4251 is in.

I created **[new draft release notes](https://github.com/collectd/collectd/releases/tag/untagged-

Looks good!

IMHO following would still be good items to highlight in overviews for all releases up to final one:

  • Metric names are changed to conform to OpenTelemetry standard: (link to OTEL Wiki doc)
  • Plugins are refactored to use "metric families" and labels, to support this: (link to changes Wiki table)
  • Use one write thread per write plugin, to prevent stalling plugins blocking other write plugins from working

I.e. have direct link to metric changes (updates one needs to do in Grafana dashboard etc), and reason for such breaking changes.

I have created the 6.0.0.rc0 release, uploaded new tarballs, updated the website, and sent an announcement email.

IMHO following would still be good items to highlight in overviews for all releases up to final one:

  • Metric names are changed to conform to OpenTelemetry standard: (link to OTEL Wiki doc)
  • Plugins are refactored to use "metric families" and labels, to support this: (link to changes Wiki table)

No need for overview section, "Read plugins" section has link to Wiki, which has link to OTEL info.

  • Use one write thread per write plugin, to prevent stalling plugins blocking other write plugins from working

But this info could be added to the "Write plugins" section, as it's a major behavior change for all of them.

Wiki is missing some of the changes: https://github.com/collectd/collectd/wiki/collectd-6

Config option names were changed for all plugins with changed metrics, not just memory plugin (I still disagree with the Usage/Utilization option names, and would much prefer Absolute/Ratio instead though).

As to new 5.x release. What if 5.13 is released first, and then if that seems fine, pick few most relevant fixes from that to a 5.12.1 release (at least CI fixes)?

  • main =>collectd-5.13: Earlier there were about 2 releases per year, but it's been >2 years since last release, so having new one rather sooner than later would be good to show that project is alive again... Here are few things that may be good to handle before that though:

Updated list of what IMHO would be good to handle before 5.13 release:

There are also couple of older issues that IMHO might be good to look at, if somebody has time:

(Especially #2465 would IMHO be nice to untangle, as there are several related tickets and still-open old PRs, that could be closed along with it.)

I've pinged some of those and some other old bugs. IMHO near decade old tickets with "Pending feedback" and "Waiting for response" labels could be just closed with a comment asking for them to be re-opened if they're still relevant with collectd 5.12.

EDIT: added couple of other relevant looking (easy) old bugs.

I'd like to manage expectations: My primary focus is collectd 6, at least for the time being. I'm happy to review changes going into collectd 5, and/or to create and publish a 5.12 or 5.13 release, but I'm most likely not going to invest engineering effort myself.

I'm also really interested only in v6, but I'm fairly sure (most of) those v5 issues are also in v6. While most plugins still have not been ported to v6 APIs / have v6-specific changes, it IMHO it makes more sense to fix the issue in v5 and then merge fix to v6.

As I do not think any of those to be a regression from previous release, I think it's more important just to get 5.13 release out though.


The bugs fixes since 5.12 (released 3.5 years ago!) indicate that many users could have some issues in continuing using that with latest compilers and dependencies, meaning that they would face choice between building main, switching to (non-compatible / incomplete) v6-rc, or switching to another collector. If they choose last option, they're unlikely to come back as long as that alternative keeps working => having new 5.x (option + metric name compatible) collectd release would help keeping the existing users, while v6 is being worked on.

Btw. @octo do you have some time target, or a list of items that you think should be solved for "final" 6.0 release, e.g. subset of most important plugins that should be ported to the new v6 APIs and OTEL metric naming?

(I have some time still to help with the release during this quarter, but unlikely to have much after that, so I'm wondering whether there's any way v6.0 can happen before that...)

Btw. @octo do you have some time target, or a list of items that you think should be solved for "final" 6.0 release, e.g. subset of most important plugins that should be ported to the new v6 APIs and OTEL metric naming?

Not really. I think #4273 (processes plugin) should be in a "final" 6.0 release.

One idea that has been floating around in my head was to create a user survey asking which plugins people relied on. We could then try to ensure that the new version support some percentage of these use-cases. I'm not really sure how to structure the survey though: listing all 176 plugins may be hard to navigate for users, free text fields are very hard to evaluate, and mixing the two (list what we think to be the most popular plugins + provide an "other" text box) introduces bias. I'd be happy to hear more perspectives on this.

There are a few things that would be really nice to have:

  • The network plugin (receive-only) so collectd 6 could receive metrics from a collectd 5 client. This would be a huge help in upgrading a fleet.
  • The write_tsdb plugin – OpenTSDB is one of the most popular time-series databases.

I'm torn what to do with the "disabled" plugins (list below). On the one hand, I don't think we should ship with dead code and it would be quite easy to git revert the deletion. On the other hand, keeping them in the code base would be a lower barrier to entry for new contributors that care about these plugins.

(I have some time still to help with the release during this quarter, but unlikely to have much after that, so I'm wondering whether there's any way v6.0 can happen before that...)

Okay, so waiting for collectd 6 to have feature parity with collectd 5 is a recipe for failure. I think we have to define a "good enough" state and just run with it. My gut feeling is that we're almost there. I assume by "this quarter" you mean January through March. I think that it's definitely possible to arrive at a "good enough" state by then.

List of disabled plugins

  • aggregation
  • amqp
  • amqp1
  • barometer
  • check_uptime
  • csv
  • curl_xml
  • gmond
  • grpc
  • java
  • match_empty_counter
  • match_hashed
  • match_regex
  • match_timediff
  • match_value
  • modbus
  • mqtt
  • network
  • openldap
  • perl
  • postgresql
  • python
  • redis
  • rrdcached
  • rrdtool
  • snmp
  • snmp_agent
  • statsd
  • target_notification
  • target_replace
  • target_scale
  • target_set
  • target_v5upgrade
  • threshold
  • write_graphite
  • write_kafka
  • write_mongodb
  • write_riemann
  • write_sensu
  • write_syslog
  • write_tsdb

One idea that has been floating around in my head was to create a user survey asking which plugins people relied on. We could then try to ensure that the new version support some percentage of these use-cases

I think check box list of all plugins is fine, as long as they're presented reasonably so that user can easily find ones that are relevant for him (in alphabetical order, preferably in few colums so that one sees all at a glance).

What / how many questions should be asked about the plugins?

  • most important plugins for the user
  • additional ones user considers also relevant
  • plugins user has stopped to use, or otherwise thinks least relevant / obsolete

It may also be important to ask something like this...

Collectd v6 transitions plugins to OpenMetrics / OpenTelemetry compliant metric names, for following benefits:
[description]

How much impact changed metric names have on me:

  • I've been eagerly waiting for OpenMetrics / OpenTelemetry support
  • Updating collectd metric consumers is some work, but the benefits look useful in the long run, so it's OK
  • Updating is hard, I'd prefer none of the indicated plugins not to be migrated to new names / features
    • (In this case please make sure your indicated plugin set is the minimum you need)

Last one is about whether there's some subset of plugins which would be useful to still support, but not to migrate them to new APIs, at least yet.

I'm torn what to do with the "disabled" plugins (list below). On the one hand, I don't think we should ship with dead code and it would be quite easy to git revert the deletion. On the other hand, keeping them in the code base would be a lower barrier to entry for new contributors that care about these plugins.

On quick look at the issue & PR titles, match_* and target_* plugins gave almost no hits.

E.g. write_graphite, write_kafka, rrd* and python plugins gave a lot of hits, but I'm not sure whether that's just an indication that they are more complex and therefore likely to have more issues....

(Python plugin may be important one for potential contributors to test things.)

(I have some time still to help with the release during this quarter, but unlikely to have much after that, so I'm wondering whether there's any way v6.0 can happen before that...)

... My gut feeling is that we're almost there. I assume by "this quarter" you mean January through March.

Yes, but I'll need to start winding down things already as there are other things I need to finish before end of quarter (which I've been post-poning).

Discussed with my manager, and while I don't have much time for collectd in March, I should have some time again after it => release happening after Q1 should be fine.

(There are also some internal processes that I may need to go through before collectd release, so release coming a bit later is actually better from my point of view.)

Is the possible release process progressing?
6.0 has seen just a few commits (gpu_sysman, processes) since rc3 in February, and "main" has seen just a few commits since then (mainly intel_rdt), and 5.12 has one commit.

Ps.
Regarding the above discussion list of "Disabled plugins in 6.0", one is important for OpenWrt router OS:

  • rrdtool is the used for the stats graphing, so it has quite wide usage.
  • additionally, network provides stats connectivity e.g. for collecting stats from several routers to one. (But that is more rarely used feature.)

Is the possible release process progressing?

@hnyman collectd releases (and collectd progress in general) depend on @octo, but he's been active here last in (early) April... :-/

  • rrdtool is the used for the stats graphing, so it has quite wide usage.

There are 35 open tickets mentioning rrd: https://github.com/collectd/collectd/issues?q=is%3Aissue+rrd+is%3Aopen+

And last update to it was 5 years ago:

Could any of those users consider:

  • Checking whether those issues are still valid (in current distros) & updating the tickets,
  • Contributing fixes to the issues (PRs to main branch), and
  • Migrating that plugin to v6 APIs (using other v6 plugins and their PRs as examples)
    ?
  • additionally, network provides stats connectivity e.g. for collecting stats from several routers to one. (But that is more rarely used feature.)

network plugin name is too generic to be able to easily find all open issues related to it, but at least last updates to it were within last year: https://github.com/collectd/collectd/commits/main/src/network.c

Any news regarding a release? At the beginning of the year it looked a little bit like conference driven development might accelerate things... (No offense, just asking.)

I'd like to manage expectations: My primary focus is collectd 6, at least for the time being. I'm happy to review changes going into collectd 5, and/or to create and publish a 5.12 or 5.13 release, but I'm most likely not going to invest engineering effort myself.

Could you please bless somebody to handle stable branch and make some releases? Some Linux distros, for instance openSUSE, started to pack git main branch due to lack of release activity.