elastic / package-storage

Package storage for packages served through the package registry service

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[APM] APM uses non-standard var types

jen-huang opened this issue · comments

(Not sure where to file this bug as I didn't see APM package in integrations repo)

This came up from PR elastic/kibana#93585 (comment): it seems that APM is using var types that are not part of the package spec:
https://github.com/elastic/package-storage/blob/snapshot/packages/apm/0.1.0-dev.7/manifest.yml#L44-L136:

type: string
type: int

Package spec defines these types: https://github.com/elastic/package-spec/blob/master/versions/1/data_stream/manifest.spec.yml#L50-L55

              - integer
              - bool
              - password
              - text
              - yaml

Kibana has conditional logic in a few places depending on the var type, so it is very important that Kibana knows about all the possible var types that a package can contain.

It seems like APM package needs to update string to text and int to integer.

@simitt are you the best POC to tag for this?

If we make this change in Kibana I think we should check all packages, and maybe add some package validation so that no new unsupported types can be added.

Closing the loop re: validation (thanks @ycombinator!), there is validation for var types in elastic-package, all packages that are in integrations run through this validation, but APM doesn't since it's not stored there. Based on this @skh, I think it's safe to say that this is an issue with only the APM package.

@simitt Has elastic-package check been run as part of APM package PRs?

Thanks for catching this @jen-huang! I don't believe that this check has ever been run on APM packages.
cc @elastic/apm-server

@simitt Could you please elaborate more on deciding not to use the elastic-package for package development? I'm afraid that it may lead to further inconsistencies in the future. I wonder if it's because the tool is not feasible for your use case or there is a different reason.

@mtojek I defer to @jalvz as he was evaluating the best way forward for APM Server.

We create the package from apm-server because it is much simpler, as we can import apm-server packages (I mean go packages, no fleet packages) for various things regarding fields, server version, ecs, pipelines, docs, etc.
So for instance if a new PR in apm-server adds a new field, running make will update the apm package (now I mean fleet package, not go) with the new field, docs, etc. Same when adding pipelines (which we still need in 7.x), etc.

This way there is a single source of truth and the apm package is always up to date for each corresponding stack version. Here apm differs again because we intend to follow stack versioning, so the workflow you propose is not really optimal for us...


TBH, I am a bit confused by all the fragmentation in this space, it is not clear what belongs where and it is hard to discover and be up to date with new developments if one is not part of the team.

The package-registry for instance runs some validations when building, and some more when booting up (afaik). Why elastic-package check is not part of that? It would be trivial to have a dev flag to pass to a docker-compose file and run that.
Or, alternatively, move it to package-storage and validate in CI....

I am worried that a wrong type can go unnoticed for so long - that means that elastic-package is not a mere helper tool as suggested, but kind of a required step to go trough.

The supported (and recommended) workflow starts in integrations (elastic-package runs as part of CI), then packages are synced with package-storage.

You're using package-registry directly which is not recommended by us for Integrations developers.

This way there is a single source of truth and the apm package is always up to date for each corresponding stack version.

There is already a single source of truth - elastic-package tool and the package spec it embeds.

Why elastic-package check is not part of that? It would be trivial to have a dev flag to pass to a docker-compose file and run that.

It's because of the fact that you shouldn't interact with package registry directly. It's an internal implementation detail. There is no validation in package-registry, as packages coming from integrations are already verified and stable. Regarding packages pushed directly to the package-storage, we have a plan to enable the elastic-package check there (so far we didn't have too many teams directly pushing content to package-storage).

cc @ycombinator

When the apm package started we had the discussion on where this package should be. Already then my opinion was that it should be part of integrations because of all the benefits you get with it. When the development of the apm package started the recommended way was already not to use package-registry directly and referred to https://github.com/elastic/integrations/blob/master/CONTRIBUTING.md docs on how to build packages. This guide has grown over the past few months and I strongly recommend to have a look again.

I understand that there are benefits of having the package directly in the apm-server repo. But I think the benefits of being in the integrations repo with all the automation outweights the downsides (my opinion). At the same time, being part of the integrations repo is not a requirement which is a feature. Anyone can build packages where the best place is for it. In any case, I strongly recommend to use elastic-package as that is what we build it for.

@mtojek @ycombinator Do we run these validations on packages pushed to package-storage?

Do we run these validations on packages pushed to package-storage?

Currently we do not but it's clear that we need to. I've filed an issue to implement it: #1013.

Personally I think it's a good thing that the apm package is not part of the integrations repo. Sometimes I worry that elastic-package is becoming too tightly coupled with the integrations repo so having an actual package that is not being developed in that repo seems like a good thing to keep us honest. It is helping us prepare the elastic-package tool and even the package-storage repo towards a future where packages may come from a variety of different teams/contributors, some of who may not even be at Elastic.

So I'm okay with the apm package staying in the apm-server repo (again, just my personal opinion). But I do also strongly recommend using elastic-package for linting, testing, promoting packages between registry stages, etc. We've put a fair bit of effort into documentation for this tool (the README is a good entry point) but I'm sure there are places we could improve even more. If there are problems with the developer experience of using elastic-package, we need to fix them — just let us know by filing an issue in the elastic-package repo!

I agree that we should be using elastic-package linting, promotion, etc. We did talk within the APM Server team about doing that earlier, but it slipped from our minds. I've opened an issue to make sure we get this into our CI for 7.13.

As for which repo the package should be in: I still think having it in apm-server makes most sense, since we're maintaining both an integration package and old-style fields.yml, pipelines, etc. for running apm-server standalone (i.e. a beat). We regularly add fields and modify our ingest pipeline which need to be made for both, and these are also typically coupled with changes in the APM Server code. As @jalvz said above, our intention is for the APM package to be aligned with the stack version -- changes in the package and code would be versioned together, and need to be tested together.

How we're working today is:

  • we update the old-style fields.yml, ingest pipelines etc.
  • we have scripts to transform and copy these across to our integration package, and our CI checks that they're kept in sync
  • we have end-to-end tests which build and run APM Server under Fleet with the in-repo integration package

Once the old-style standalone apm-server mode is no longer supported, we would move to updating the integration package directly. In theory we could then manage it in the integrations repo, but I think it would still make sense to co-locate it with the APM Server code to simplify development and testing.

This way there is a single source of truth and the apm package is always up to date for each corresponding stack version.

There is already a single source of truth - elastic-package tool and the package spec it embeds.

@mtojek Juan was referring to a single source of truth for the APM fields, ingest pipelines, etc. Maintaining the package externally to apm-server would require a much more concerted effort to keep them in sync. I hope this clarifies things - please reach out on Slack if you still have concerns.

Interesting points @ycombinator around having it in a separate repository. I agree it might even benefit us to make sure elastic-package can be used independent of integrations which must be the case. I think this is also a question around timing. In the beginning it would probably have been easier to have it in integrations for both teams as we made breaking changes and would have adjusted the apm package directly. At one point, apm should have full control over the package like it does today. So maybe it would have been easier to get started in integrations and then take it into the apm repo. Interestingly @axw suggest to bring it to integrations long term which is also fine with me.

For me the take aways are:

  • We need to have more / better CI checks on package-storage to make sure it does not matter where a package comes from. We cannot assume it was built in integrations
  • elastic-package must work in any file structure
  • Everyone building packages should use elastic-package

So maybe it would have been easier to get started in integrations and then take it into the apm repo. Interestingly @axw suggest to bring it to integrations long term which is also fine with me.

Just a small clarification: I'm not suggesting we do that, I was just saying it would be a little less painful after we no longer have to maintain standalone as well as Fleet-managed apm-server. I think it would still be a pain to have the package and code in separate repos, since it would make it more difficult to develop and test changes that involve changing both.

Anyway, we agree on the key takeaways.

Fixed in #1012