aws / aws-cdk-rfcs

RFCs for the AWS CDK

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CDK post-deployment experience

rix0rrr opened this issue · comments

Description

We want to support operators of CDK applications during their common tasks. What are the biggest problems/frustrations you would like us to address?

  • Getting an overview of the application?
  • Creating dashboards and alarms?
  • Ticketing?
  • Operational tasks?
  • Log inspection?

Let us know in the discussion below.

Roles

Role User
Proposed by @rix0rrr
Author(s)
API Bar Raiser
Stakeholders

See RFC Process for details

Workflow

  • Tracking issue created (label: status/proposed)
  • API bar raiser assigned (ping us at #aws-cdk-rfcs if needed)
  • Kick off meeting
  • RFC pull request submitted (label: status/review)
  • Community reach out (via Slack and/or Twitter)
  • API signed-off (label api-approved applied to pull request)
  • Final comments period (label: status/final-comments-period)
  • Approved and merged (label: status/approved)
  • Execution plan submitted (label: status/planning)
  • Plan approved and merged (label: status/implementing)
  • Implementation complete (label: status/done)

Author is responsible to progress the RFC according to this checklist, and
apply the relevant labels to this issue so that the RFC table in README gets
updated.

Yes! This would be fantastic.

I have questions about the boundary between the DevOps and SysOps post-deployment experience.

  1. Is this focused on a "per-application single-pane of glass" post-deployment experience for DevOps?
  2. What about the "multi-account, multi-app single-pane of glass" post-deployment experience for SysOps?
  • There may not be clear Team/Enterprise boundaries for DevOps & SysOps, but regardless, it would be nice to have patterns/worflows for a clear pathway to the multi-account, multi-app post-deployment experience on the SysOps side.

  • Perhaps eventbridge "cross-account event backbone" could be used as an enabler for SysOps monitoring Dashboards?
    https://dev.to/eoinsha/how-to-use-eventbridge-as-a-cross-account-event-backbone-5fik

  • Integration with OpenSearch/Graphana/Prometheus/aws-discovery-agent going back to prerequisites for custom managed infra.

We want to support operators of CDK applications during their common tasks. What are the biggest problems/frustrations you would like us to address?

I have come across two personas of operators in this sense, one being the more "DevOps" aligned build and run team, the other being the traditional "ops" team with often little insight into the application.

A frustration I've seen repeatedly with customer teams is the complexity involved in correlating logs (both "system logs" like CloudFormation deployments, custom resources, flow logs; and "application logs"). This issue does not stem from CDK itself, and can e.g. be addressed with appropriately crafted Insights queries. The CDK should be able to improve the user experience here.

An intuitive "here are all the logs" view would benefit both build and run, as well as run-only teams. Personally I'd love to see an extensible cdk logs --follow capability (see #277).

@kadrach yeah, that's what I'm getting at with a SysOps workbench "single pane of Glass" post-deployment experience.
AWS have OpenSearch, Graphana, Prometheus etc which they could develop a CDK Secure Org bootstrap pattern for which bundles all account logs into an OpenSearch searchable interface with Graphana etc over the top.

What about the "multi-account, multi-app single-pane of glass" post-deployment experience for SysOps?

I think by default it would be "whatever you say belongs together will go together"... if there are multiple levels of hierarchy would that be sufficient? What is the use case you are thinking of? At the very least we do want to support multiple accounts.

Integration with OpenSearch/Graphana/Prometheus/aws-discovery-agent going back to prerequisites for custom managed infra.

Interesting. I'm not sure what that would look like. The simplest to achieve would be to embed arbitrary pages, I suppose. But that may not be enough integration, to which the alternative would be having to write adapters that can query various metrics backends.

An intuitive "here are all the logs" view would benefit both build and run, as well as run-only teams

I agree that there are many logs that are widely spread out and they're not always easy to search. I definitely see the benefit here.

(Honestly -- a unified logs viewer that pulls from a bunch of different log groups and other sources is totally something someone could build today)

What is the use case you are thinking of?

BAU SysOps for monitoring and alerting of all the things aiming at Site Reliability Eng metrics.

if there are multiple levels of hierarchy would that be sufficient?

If SysOps are having to constantly log into different accounts (maybe 100) to gain any insight or monitoring then no, we don't want account-based hierarchy.

the alternative would be having to write adapters that can query various metrics backends.

Surely AWS already have some of this implemented for SRE monitoring of the AWS Console suite?? We don't want to re-invent the wheel, but yes adapters would be fine.

unified logs viewer

Yes, a unified logs viewer and searcher += integrate with unified CMDB (what owner/costcode/system/service created the logs)