Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Technical Paper Proposal: Cloud Native Infrastructure-Lifecycle #759

Open
rynowak opened this issue Oct 15, 2024 · 4 comments
Open

Technical Paper Proposal: Cloud Native Infrastructure-Lifecycle #759

rynowak opened this issue Oct 15, 2024 · 4 comments

Comments

@rynowak
Copy link
Collaborator

rynowak commented Oct 15, 2024

👋 WG-Infrastructure-Lifecycle dropping in to say hello 👋

TAG AppDelivery has adopted a new process for writing and publishing technical papers. This issue is our initial proposal for writing a paper as one of the working group's deliverables.

You can learn more about the working group here. We'd love to have you participate and contribute along with us!


Title: Cloud-native infrastructure lifecycle

Description:

As the cloud native approach matures, the workloads we run have increasingly complex infrastructure needs. While we all strive to control costs, enforce best practices, and ensure secure configurations, the reality is often fragmented.

Despite the complexity and sophistication required, not enough has been done to meet the challenges. We're seeing significant investment in new open source infrastructure projects both in and out of CNCF but effective tooling for cloud native infrastructure lifecycle management remains elusive. The Platform Engineering movement emphasizes treating infrastructure as a product, but there's no standardized approach for managing its lifecycle.

While savvy users are embracing cloud native practices, infrastructure requirements are inherently diverse. We see an opportunity to champion technology-agnostic best practices. Infrastructure lifecycle management deserves the same level of attention and planning we dedicate to established areas of cloud native development. This ensures security, resilience, manageability, sustainability, and observability.

Audience:

Any end-user involved in or responsible for the management of cloud-native infrastructure - regardless of job title, workload, or chosen technologies.

Impact:

The whitepaper will guide end-users in managing infrastructure to ensure it is secure, resilient, manageable, sustainable, and observable. Any end-user, regardless of their role or technologies choices can leverage the whitepaper’s guidance to implement a mature and stable infrastructure management practice.

Scope:

This scope of this whitepaper covers a set of recommended practices and maturity guidelines that are generally applicable and technology agnostic. There are many complex domain-specific and workload-specific areas of infrastructure management, and we’d like to avoid going deep in any specific area to serve the biggest possible audience.

Our initial proposed set of topics (non-exhaustive):

Configuration / Infrastructure as Code

  • Development processes
  • Design and abstractions (aka. componentization / modularization)
  • Testing

Deployment approaches (Infra as Data, etc.)

  • Delivery across environments, artifacts, versioning
  • Application-aligned (spec the infra for an application and deploy it) vs. shared infra (e.g. clusters), vs. horizontal / siloed

State management/backups

  • Disaster recovery
  • Availability (not just DR but also managing scaling, resilience)

Observability

  • KPIs/Metrics
  • Policy enforcemnt

This sounds like a lot, even for an initial list. It's a really big topic and we'd rather deliver a small amount of a good content than a large amount of questionable content. We need to begin the process and see where the contributor/user interest is to refine this list more.

We've got a clear picture right now on some topics that definitely are out-of-scope:

  • Going deep on specific infrastructure-management technologies. eg: User-guide for OpenTofu
  • Going deep into the management of specific infrastructure. eg: How to failover and backup PostgreSQL servers
  • Going deep into the management of specific workloads. eg: Comprehensive guidance for websites + global CDNs
  • Going deep into specific industry-verticals. eg: Healthcare
  • Physical infrastructure. eg: Servers, racks, CPUs.

Also, there are potentially many overlaps between other CNCF guidance and our proposed whitepaper. For example, many areas of the infrastructure lifecycle are security-critical. We plan to surface the existing guidance created by others where possible, and to collaborate with other TAGs and WGs.

@rynowak
Copy link
Collaborator Author

rynowak commented Oct 15, 2024

Hi folks, we're trying to assess interest in the topic and find contributors who want to work on this paper with us. We plan to present this at the TAG App-Delivery general meeting on 10/16 (schedule willing).

The best way to get involved is to comment on this issue OR to reach out to us on CNCF slack in the #wg-infrastructure-lifecycle channel OR to attend a working group meeting.

You can learn more about the working group here. We'd love to have you participate and contribute along with us!

@kief
Copy link

kief commented Oct 18, 2024

Great! I've been coming to the calls and am looking forward to helping out with the paper.

@bschaatsbergen
Copy link
Collaborator

Hey @rynowak, thanks for delivering this outline. IIRC, @elft3r is also working on an outline—maybe we can merge the two and use that as a base to draft an RFC?

@rynowak
Copy link
Collaborator Author

rynowak commented Dec 3, 2024

That sounds great! I'll follow up with @elft3r

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants