Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[draft] ec2 architecture and guide #4585

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from
125 changes: 125 additions & 0 deletions docs/self-managed/concepts/reference-architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
---
id: reference-architecture
title: "Reference Architecture"
description: "Learn about the self-managed reference architectures and how they can help you get started."
---

## Target User

- **Enterprise Architects**: To design and plan the overall system structure.
- **Developers**: To understand the components and their interactions.
- **IT Managers**: To ensure the system meets business requirements and is maintainable.

Reference architectures help these users by providing:

- **Best Practices**: Proven methods and techniques for system design.
- **Consistency**: Standardized approaches that ensure uniformity across projects.
- **Efficiency**: Accelerated development by reusing established patterns.
- **Risk Reduction**: Mitigation of common pitfalls through well-documented guidelines.

## Preface

Reference architectures provide a blueprint for system design and implementation, offering a standardized approach to solving common problems. They serve as a guide for enterprise architects, developers, and IT professionals to build robust and scalable systems. By following a reference architecture, organizations can ensure consistency, reduce risks, and accelerate the development process.

## Customization and Flexibility

It's important to note that reference architectures are not a one-size-fits-all solution. Each organization has unique requirements and constraints that may necessitate modifications to the provided blueprints. While these reference architectures offer a solid foundation and best practices, they should be adapted to fit the specific needs of your project. Use them as a starting point to start your development process, but be prepared to make adjustments to ensure they align with your goals and infrastructure.

## Support Considerations

We recognize that deviations from the reference architecture are unavoidable. However, such changes will introduce additional complexity, making troubleshooting more difficult. When modifications are required, ensure they are well-documented to facilitate future maintenance and support more quickly.

## Architecture

<!-- TODO: include overview, Hamza had good pictures on this topic -->

### Orchestration Cluster vs Management Cluster

When designing a reference architecture, it's essential to understand the differences between an orchestration cluster and a management cluster. Both play crucial roles in the deployment and operation of processes, but they serve different purposes and include distinct components.

#### Orchestration Cluster

We refer to the orchestration or automation cluster to the core of Camunda.

The included components are:

- [Zeebe](./../../components/zeebe/zeebe-overview.md): A workflow engine for orchestrating microservices and managing stateful, long-running business processes.
- [Operate](./../../components/operate/operate-introduction.md): A monitoring tool for visualizing and troubleshooting workflows running in Zeebe.
- [Tasklist](./../../components/tasklist/introduction-to-tasklist.md): A user interface for managing and completing human tasks within workflows.
- [Optimize](#TODO): An analytics tool for generating reports and insights based on workflow data.
- [Identity](./../identity/what-is-identity.md): A service for managing user authentication and authorization.
- [Connectors](./../../components/connectors/introduction.md): Pre-built integrations for connecting Zeebe with external systems and services.

The orchestration cluster in itself is isolated and each of the above components have a 1:1 relation. So a single Operate instance can only talk to a single Zeebe instance as the data is dependent.

#### Management Cluster

The management cluster is designed to oversee and manage multiple orchestration clusters. It offers tools and interfaces for administrators and developers to monitor clusters and create BPMN models. The management cluster operates independently from the orchestration cluster and can function without requiring an orchestration cluster.

The included components are:

- [Console](./../../components/console/introduction-to-console.md): A central management interface for monitoring and managing multiple orchestration clusters.
- [Web Modeler](#TODO): A web-based tool for designing and deploying workflow models to any available orchestration cluster.
- [Identity](./../identity/what-is-identity.md): A service for managing user authentication and authorization.

The management cluster supports a 1:many relationship, meaning a single Console instance can manage multiple orchestration clusters, and the Web Modeler can deploy models to any available cluster.

:::note

Identity is listed twice because there are two distinct Identity components: one within the application layer and another for the management cluster. These components are disjoint from each other. For production setups, it is recommended to use an external identity provider. However, it is possible to use the management Identity as an OIDC provider for the application Identity.

:::

### High Availability (HA)

High availability (HA) ensures that a system remains operational and accessible even in the event of component failures. Generally all components are equipped to be run in a highly available manner. Some components do need extra considerations when run in HA mode.

Following should be considered when choosing high availability:

- **Increased Uptime**: Ensures that services remain available even during hardware or software failures.
- **Fault Tolerance**: Reduces the risk of a single point of failure by distributing workloads across multiple nodes.
- **Increased Performance**: Zeebe scales both vertically and horizontally.
- **Cost**: Higher costs due to the need for additional hardware, software, and maintenance.
- **Complexity**: Requires more sophisticated infrastructure and management, increasing the complexity of the system.

While high availability is one part of the increased fault tolerance and resilience, you should also consider regional or zonal placement of your workloads.

If you run infrastructure on cloud providers, you are often met with different regions and zones. For ideal high availability you should consider a minimum setup of 3 zones within a region as this will guarantee that in case of a zonal failure that the remaining two workloads can still process data. For more information on how Zeebe handles fault tolerance, have a look at the [raft consensus chapter](./../../components/zeebe/technical-concepts/clustering.md#raft-consensus-and-replication-protocol).

You can run Camunda also just with a single instance for various reasons but be sure to make [regular backups](./../zeebe-deployment/operations/backups.md) as your resilience is limited.

In the end it depends on your uptime requirements, budget, criticality of the workflow engine, and performance requirements.

## Use Cases

### Kubernetes

Kubernetes is a powerful orchestration platform for containerized applications. Using a reference architecture for Kubernetes can help organizations deploy and manage their applications more effectively. It provides guidelines for setting up clusters, managing workloads, and ensuring high availability and scalability. This approach is ideal for organizations looking to leverage the benefits of containerization and self-healing capabilities.

### Manual (Bare Metal / VMs)

For organizations that prefer traditional infrastructure, reference architectures for bare metal or virtual machines (VMs) offer a structured approach to system deployment. These architectures provide best practices for setting up physical servers or VMs, configuring networks, and managing storage. They are suitable for environments where control, and security are critical, and where containerization may not be feasible or necessary.

### Local Development

While both options are suitable for trying out Camunda 8 locally, you might also consider exploring [Camunda 8 Run](./../setup/deploy/local/c8run.md) for a more developer focused experience.

## Helping Customers Decide

Choosing the right reference architecture depends on various factors such as the organization's goals, existing infrastructure, and specific requirements. Here are some guidelines to help you decide:

- **Kubernetes**:
- Ideal for organizations adopting containerization and microservices.
- Suitable for dynamic scaling and high availability.
- Best for teams with experience in managing containerized environments.
- A steeper learning curve and continuous platform investment.

For more information and guides, have a look at the specific reference for [Kubernetes](#TODO).

- **Manual (Bare Metal / VMs)**:
- Suitable for organizations requiring control.
- Ideal for environments where security and compliance are critical.
- Applicable for high availability but requires more planning ahead.
- Best for teams with expertise in managing physical servers or virtual machines.

For more information and guides, have a look at the specific reference for [Manual](#TODO).
163 changes: 163 additions & 0 deletions docs/self-managed/concepts/single-jar.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
---
id: single-jar
title: "Single JAR"
description: "Learn about the self-managed single JAR"
---

<!-- Could also be called manual? -->

<!-- Moving target, may be renamed, different focus, etc. -->

<!-- Day 1 vs Day 2 operations? -->
<!-- Installation vs Operations -->

## Preface

The Single JAR deployment option allows you to run Camunda Platform as a standalone Java application. This method is particularly suited for users who prefer manual deployment on bare metal servers or virtual machines (VMs). It provides full control over the environment and configuration, making it ideal for scenarios where custom setups or specific infrastructure requirements are necessary.

With the Single JAR approach, all necessary components are bundled into a single executable JAR file. This simplifies the deployment process, as you only need to manage one artifact. However, it also means that you are responsible for handling all aspects of the deployment, including installation, configuration, scaling, and maintenance.

Other deployment options, such as containerized deployments or managed services, might offer more convenience and automation. However, the Single JAR method gives you the flexibility to tailor the deployment to your exact needs, which can be beneficial for complex or highly customized environments.

We will later go into the details but be aware that not everything is part of this Single JAR. Have a look at the documentation on the orchestration and management cluster separation. <!-- TODO: add a link reference from reference arch -->

## Before You Start

Before you begin with the self-managed single JAR setup, please consider the complexity and operational overhead involved. Self-managing your deployment requires a good understanding of infrastructure, networking, and application management. If you are looking for a simpler and more managed solution, you might want to explore [our SaaS offerings](https://camunda.com/platform/) first. SaaS can significantly reduce the burden of maintenance and allow you to focus more on your core business needs.

## Limitations

- The focus is on the orchestration cluster, including Connectors, Identity, Operate, Optimize, Tasklist, and Zeebe.

## Target User

<!-- Maybe talk about target users, e.g. facing more mid-size companies for a more sophisticated solution Kubernetes -->

## Architecture

<!-- TODO: include picture when I get access to the draw.io stuff from Hamza. Afterwards describe it
-->

The single jar and manual way of deploying Camunda can be used for either simple architectures or high availability setups. Be aware that maintaining such setups is a lot more work compared to a solution like Kubernetes.

### Components

<!-- Components and how they interact, could be just a subpart of the Architecture -->

## Requirements

Before implementing a reference architecture, review the requirements and guidance outlined below. We are differentiating between `Infrastructure` and `Application` requirements.

### Infrastructure

Any of the following are just recommendations for the minimum viable setup, the sizing heavily depends on your use cases and usage. It is recommended to understand the documentation on [sizing your environment](https://docs.camunda.io/docs/next/components/best-practices/architecture/sizing-your-environment/) and run benchmarking to confirm your required needs.
Langleu marked this conversation as resolved.
Show resolved Hide resolved

#### Host

- Variable amount of host systems
- **1** minimum and **3** minimum for high availability (HA)

Per host:

- Minimum of **4** CPU cores (**amd64** / **arm64**)
- Minimum of **8** GB of Memory
- **32** GB SSD disk (**1,000** IOPS)
- We advise against using "burstable" disk types because of their inconsistent performance.

Example of cloud provider options:

- **Azure**: <!-- TODO: actually don't have a good recommendation atm, probably d series with an external premium v2 disk -->
- **AWS**: The general purpose [m series](https://aws.amazon.com/ec2/instance-types/) in minimum `xlarge`.
- **GCP**: The general purpose [n series](https://cloud.google.com/compute/docs/general-purpose-machines#n1_machines) in minimum `standard-4`.

#### Networking

- Stable and high-speed network connection
- Configured firewall rules to allow necessary traffic:
- **8080**: Web UI / REST endpoint.
- **9090**: Connector port.
- **9600**: Metrics endpoint.
- **26500**: gRPC endpoint.
- **26501**: Gateway-to-broker communication.
- **26502**: Inter-broker communication.
- Load balancer for distributing traffic (if required)

:::note
Some ports can be overwritten and are not definitive, you may conduct the [documentation](#TODO) to see how it can be done for the different components, in case you want to use a different port. Or in our example `Connectors` and `Web UIs` overlap on 8080 due to which we moved connectors to a different port.
:::

### Application

- Java Virtual Machine, see [supported environments](./../../reference/supported-environments.md) for version details.

### Database

- Elasticsearch / OpenSearch, see [supported environments](./../../reference/supported-environments.md) for version details.

Our recommendation is to use an external managed offer as we will not go into detail on how to manage and maintain your database.

## Deployment Model

<!--
Deployment Topology
Describe whether the architecture is single-region, multi-region, or hybrid.
Configuration Guidelines
Best practices for configuring the environment for optimal performance and reliability.
Automation and CI/CD Pipelines
Suggested tooling and workflows for automated deployments and updates.
-->

## Scalability and Performance Considerations

<!--
Maybe we have some information on this in the docs

Scalability Patterns
Recommended patterns for scaling compute, storage, and networking resources.
Load Balancing and Caching
Best practices for distributing traffic and caching data to enhance performance.
Performance Optimization Tips
Tips for optimizing performance across different components.
-->

## Configuration

Configuration for the Single JAR deployment can be managed either through the `application.yml` file or via environment variables. This flexibility allows you to choose the method that best fits your deployment and operational practices.

For a comprehensive list of configuration options for each component, refer to each component mentioned in [self-managed](https://docs.camunda.io/docs/next/self-managed/about-self-managed/). The documentation provides detailed information on all available settings and how to apply them.
Langleu marked this conversation as resolved.
Show resolved Hide resolved

The following components comprise the single jar and are configured via a single `application.yml`:

- [Identity](#TODO)
- [Operate](./../operate-deployment/operate-configuration.md)
- [Optimize](#TODO)
- [Tasklist](./../tasklist-deployment/tasklist-configuration.md)
- [Zeebe](./../zeebe-deployment/configuration/configuration.md)

The `Connectors` are standalone and can be configured as outline in their [respective documentation](./../connectors-deployment/connectors-configuration.md).

### Optional: configure license key

Installations of Camunda 8 Self-Managed which require a license can provide their license key to the components as an environment variable:

| Environment variable | Description | Default value |
| --------------------- | -------------------------------------------------------------------- | ------------- |
| `CAMUNDA_LICENSE_KEY` | Your Camunda 8 license key, if your installation requires a license. | None |

:::note
Camunda 8 components without a valid license may display **Non-Production License** in the navigation bar and issue warnings in the logs. These warnings have no impact on startup or functionality, with the exception that Web Modeler has a limitation of five users. To obtain a license, visit the [Camunda Enterprise page](https://camunda.com/platform/camunda-platform-enterprise-contact/).
:::

## Sizing Guidelines

## Upgrades

<!-- TODO: No idea -->

<!-- zero-downtime? -->

## Reference implementations

Designed and tested for default setups with the minimum required sizing in mind while support high availability.

- AWS EC2
Binary file added docs/self-managed/setup/assets/aws-ec2-arch.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading