opensearch-project / opensearch-catalog

The OpenSearch Catalog is designed to make it easier for developers and community to contribute, search and install artifacts like plugins, visualization dashboards, ingestion to visualization content packs (data pipeline configurations, normalization, ingestion, dashboards).

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[PROPOSAL] [RFC] Integration Catalog Version-Sets

Swiddis opened this issue · comments

Date: 23 June, 2023

What/Why

What are you proposing?

This proposal is to introduce Version-Sets to the catalog project. Version-Sets will allow integrations to specify a set of required component versions within a catalog, and query the catalog for the most powerful available version set that adheres to the specified versions. This will empower integration developers with the flexibility to select the desired functionality for their integrations, while ensuring cross-compatibility between integrations for more complex correlation.

What users have asked for this feature?

Informed second-hand by @YANG-DB: Users have expressed the need for the ability to combine multiple integrations and apply filters to disjoint fields from each integration. Currently, it is not possible to perform joined queries across integrations while filtering fields from different integrations simultaneously, unless both integrations share that field. Users have requested a solution that allows for such queries while maintaining ease of use and backward compatibility.

What problems are you trying to solve?

The existing system of versioning in integrations poses limitations when performing joined queries for correlations. Integrations currently specify a subset of fields that they use, and the set up of integrations creates indices that only includes these fields. Queries within OpenSearch that span multiple indices are only able to query for fields that are shared by every index. If one integration has fields A and another integration has fields A+B, then no queries can be run on both indices while only querying field B. This makes more complex correlation use-cases with many outer joins challenging and typically inefficient. This limitation restricts the flexibility of integrations, unless the user preemptively configures the index for integration 1 to include A+B, despite only fields A being actually used.

What is the developer experience going to be?

Integration Developers will have the option to specify a set of versions for each integration. On set up, the integration plugin will query the catalog by requesting the most powerful version set that adheres to the specified versions. The catalog will return an updated version set, ensuring that the set includes the required fields for its functionality. Developers can rely on Version-Sets in order to select the minimal desired functionality for their integrations, without compromising cross-compatibility with other integrations.

Developer Experience (Example)

To illustrate the developer experience, we can walk through a specific example. Suppose that the observability catalog contains a Version-Set with three schemas:

[
  { "name": "logs", "version": "1.0.0" },
  { "name": "http", "version": "1.0.0" },
  { "name": "communication", "version": "1.0.0" }
]

For this example, suppose there are two integrations, the Alice Integration and the Bob Integration. The Alice integration involves communication logs, so it specifies a dependency on two components: logs-1.0.0 and communication-1.0.0. The Bob integration uses http logs, so it specifies two different components: logs-1.0.0 and http-1.0.0. As they have no dependency on further components, their developers' work is done.

When a user is setting up the Alice integration, they are now presented a choice: they can set up a minimal index for just the Alice integration, or they can set up a super-index for observability integrations.

  • If they choose to set up a minimal index, it will only include fields corresponding to the logs and communication components. This will be more efficient for storage, but will not have the same functionality for correlations.
  • If they choose to set up a super-index, the plugin will query the catalog with the two component versions. The catalog will perform some resolution logic, and eventually determine and return the aforementioned version set with three components. The plugin can then freely set up an index using all three components, which can be used by the Bob integration in the future.

Are there any security considerations?

The introduction of Version-Sets does not introduce any additional security concerns. The security measures and validations currently in place for integrations will continue to apply to the new versioning system.

Are there any breaking changes to the API

The API will require modifications to support the Version-Sets feature. The changes will primarily involve the integration metadata and the catalog querying mechanism. Existing integrations using the previous versioning system will need to be migrated to the new format.

What is the user experience going to be?

Users will benefit from the enhanced functionality provided by Version-Sets. The UI will be updated to allow users to preemptively select the desired functionality (e.g., combining fields from multiple integrations) when configuring their integrations. The UI will provide clear options and instructions for users to choose the appropriate Version-Set.

Are there breaking changes to the User Experience?

The introduction of Version-Sets will not result in any breaking changes to the user experience. Users will still configure integrations as before, but with the added option to select a Version-Set that suits their requirements.

Why should it be built? Any reason not to?

Version-Sets should be built to address the limitations of the current versioning system and provide users with the ability to perform complex queries across integrations while filtering fields from different integrations. By allowing integrations to specify a set of versions and querying the catalog for the most powerful version set, users can customize their integrations without sacrificing backward compatibility. This feature will enhance the functionality and flexibility of the integrations project.

What will it take to execute?

To execute this feature, the following steps are proposed:

  1. Modify the integration metadata to include a Version-Set field that can be used by the catalog.
  2. Update the catalog querying mechanism to support Version-Set queries.
  3. Implement basic Version-Set functionality by maintaining a list of version sets with hardcoded values.
  4. Allow integrations to query the catalog and retrieve the most powerful version set adhering to the specified versions.
  5. Add support for fuzzy-matching and range dependencies in version sets to provide more flexibility in selecting versions.
  6. Perform thorough testing to ensure the correctness and reliability of the Version-Sets feature.

Any remaining open questions?

  • How should the UI be designed to clearly convey the option for users to select a Version-Set and choose their desired functionality?
  • Are there any additional considerations or requirements for the implementation of fuzzy-matching and range dependencies in integration versions?