openzim / libzim

Reference implementation of the ZIM specification

Home Page:https://download.openzim.org/release/libzim/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Move libzim CD from kiwix-build to this repo

kelson42 opened this issue · comments

commented

The need of a meta compilation tooling for our cpp repositories, has led to the creation of kiwix-build. Unfortunately, the need of a compilation tooling, and the place of execution of it (which are two slightly different tooics) have been “mixed” and now the execution of kiwix-build for various repos CD is in kiwix-build repo.

This lead to various problems or at leat oddities. This ticket is there to identifz what is missing to allow execution of kiwix-build in this repo to run CD (first row nightlies and releases) and maybe even help with the CI.

There is few things which had lead to the current situation. They can be regroup in only one category: Dependencies.

Inter dependencies

We have dependencies between our projects.
A change in libzim can introduce a break in libkiwix or zim-tools. So we need a way to check that our whole stack is actually compiling.
Ideally, we should check that before merging a PR. But it means that when we have a PR in libzim, we have to run a build of all other projects using it (libkiwix, zim-tools but also kiwix-desktop, kiwix-tools or java-libkiwix). Most of the time we have to run the build on main branch, but some project may have a PR to adapt to this current PR, so we would have to run this specific branch.

At the time (travisCI), it was simply not possible. With github action, this would evolve a set of workflow triggers (we have to carefully define).

So the solution was to have kiwix-build to build every night all the projects (main branch). It means that we may merge a PR breaking other project, but we would catch it pretty quickly.

External dependencies

(This is not really the subject of this issue, but while I am explaining "everything", let's speak about it also).
Our projects are using external dependencies. The CI of libzim should ideally be about the compilation of libzim only. We don't want the CI to spend (almost) hours to compile dependencies which don't change a lot.

So we need a way to compile all this dependencies and use the precompiled dependencies.
As kiwix-build is building everything, it was a good place to create the dependencies archive.

The original role of kiwix-build.

kiwix-build is a meta tools to help build other projects.
We don't want that compilation of libzim become dependent of kiwix-build.
So it feel a bit odd to me (at least it may open a door we don't want to open) to have kiwix-build run in the libzim CI to compile libzim.

[You could argue that the CI is already dependent of kiwix-build as it download the dependencies build by kiwix-build.
And I would be partly agree with you. But there is a difference between using something generated by a tool and launching this tool. This last point make me kindy ok with that.]

The current CI in libzim is:

  • From a set of dependencies already compiled (by kiwix-build but 🤫 , it could be by anything else)
  • Compile libzim using the build system of libzim (meson) and no other tool.

The current role of the CI.

The current CI is used to test that the PR is ok.
To do so, we compile libzim in a subset of the platforms we support. We assuming that buggy code would be detected when compiling in this subset. If there is compilation errors on other platform, it is mostly probable that it is a "configuration" problem. (kiwix-build compiling on all platforms)

Release and publication

We want to publish our releases only when we are sure that all our projects are ok.
kiwix-build is also the best place to do so as it build everything. It would be difficult to delay the publication of libzim until all the other CI passes.

(kiwix-build is bugged on this point as it publish libzim as soon as it has made the archive, without compiling other projects. This is why we have publish a broken libkiwix recently. But it is a bug in kiwix-build)

Conclusion

It should be possible to move to a full release workflow using individual CI, but it not a easy task. It would be necessary to explicitly state what we want to do before starting something.


This lead to various problems or at leat oddities.

Do you have a list of them ?

[...] to allow execution of kiwix-build in this repo to run CD (first row nightlies and releases) and maybe even help with the CI.

As said above, I don't think we should run kiwix-build in the repo CI.

commented

Inter dependencies

We have dependencies between our projects.

Yes, they should be as little as possible and only upstream not downstream.

A change in libzim can introduce a break in libkiwix or zim-tools.

Yes, this is normal and should be wanted. If this is unwanted, then this is the sign of a lack of control of the dev process. Downstrean softwares are not here to test, this is the role of an automated testing.

So we need a way to check that our whole stack is actually compiling.

No, this is the role of the automated tests.

Ideally, we should check that before merging a PR.

This is why the CI runs on each PR.

But it means that when we have a PR in libzim, we have to run a build of all other projects using it (libkiwix, zim-tools but also kiwix-desktop, kiwix-tools or java-libkiwix).

I think I don't need to repeat myself on this.

Most of the time we have to run the build on main branch, but some project may have a PR to adapt to this current PR, so we would have to run this specific branch.

The CI should run on any branch and all the dependences should be pinned.

At the time (travisCI), it was simply not possible. With github action, this would evolve a set of workflow triggers (we have to carefully define).

I'm against this - as far as possible - to avoid because (1) this is really complex to understand and follow (2) it's not robust because not really resilient (too many things can go wrong because too many repos are involved)

So the solution was to have kiwix-build to build every night all the projects (main branch). It means that we may merge a PR breaking other project, but we would catch it pretty quickly.

I never have see things that way, we just do nightlies... which is something unrelated to my opinion to the fact that kiwix-build is not versatile.

External dependencies

(This is not really the subject of this issue, but while I am explaining "everything", let's speak about it also). Our projects are using external dependencies. The CI of libzim should ideally be about the compilation of libzim only. We don't want the CI to spend (almost) hours to compile dependencies which don't change a lot.

Yes, and current optimisation system is efficient. We need to keep these dependencies compiled when needed and made available for download for whoever needs it. At a first look, compiling these files might be kept as kiwix-build repository responsability.

So we need a way to compile all this dependencies and use the precompiled dependencies. As kiwix-build is building everything, it was a good place to create the dependencies archive.

Yes. even if this is needed only for external dependencies and internal ones (basically only libkiwix and libkiwix I believe).

The original role of kiwix-build.

kiwix-build is a meta tools to help build other projects. We don't want that compilation of libzim become dependent of kiwix-build. So it feel a bit odd to me (at least it may open a door we don't want to open) to have kiwix-build run in the libzim CI to compile libzim.

This is just a build dependency like gtest or meson, nothing special. $pip install kiwix-build and then I should be able to use it.

[You could argue that the CI is already dependent of kiwix-build as it download the dependencies build by kiwix-build. And I would be partly agree with you. But there is a difference between using something generated by a tool and launching this tool. This last point make me kindy ok with that.]

The current CI in libzim is:

* From a set of dependencies already compiled (by kiwix-build but shushing_face , it could be by anything else)

* Compile libzim using the build system of libzim (meson) and no other tool.

The current role of the CI.

The current CI is used to test that the PR is ok. To do so, we compile libzim in a subset of the platforms we support. We assuming that buggy code would be detected when compiling in this subset. If there is compilation errors on other platform, it is mostly probable that it is a "configuration" problem. (kiwix-build compiling on all platforms)

Yes, CI is not really the topic of this ticket.

Release and publication

We want to publish our releases only when we are sure that all our projects are ok. kiwix-build is also the best place to do so as it build everything. It would be difficult to delay the publication of libzim until all the other CI passes.

If your CI does not deliver the information if your software is OK, then your CI is weak... and I definitly want to be able to relase libzim even if libkiwix is broken. Actually I should be able to release libzim without carrying about anything outside libzim,

(kiwix-build is bugged on this point as it publish libzim as soon as it has made the archive, without compiling other projects. This is why we have publish a broken libkiwix recently. But it is a bug in kiwix-build)

Conclusion

It should be possible to move to a full release workflow using individual CI, but it not a easy task. It would be necessary to explicitly state what we want to do before starting something.

This lead to various problems or at leat oddities.

Do you have a list of them ?

The fact that I have to run the CD in kiwix-build to release libzim. It's actually so intricated, that only you does it since years. Doing a release once all the light are on green, should not be anymore a technical work. Like for any other software in our portfolio, I want to fully control what is going on from the repository on which I do the release just by clicking on the "release" button.

I will try to use kiwix-build to make the CD and start to open ticket on kiwix-build to allow him to deliver what is needed to do so.

Can you give your definition of CI, CD and "automated test" and what you expect from them. I think we don't have the exact same definition and so we don't speak about the same thing.

@mgautierfr I don't really know what or how to tell more than what Wikipedia explains (just an example). The CI runs tests to assess PR and the CD publishes the builds/sources (officially at release time or not, see nightlies for example).