Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add decision page for managing the admin kubeconfig #277

Merged
merged 1 commit into from
Sep 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
110 changes: 110 additions & 0 deletions docs/modules/ROOT/pages/explanations/decisions/admin-kubeconfig.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
= Admin kubeconfig management

== Problem

We currently store the kubeconfig for the `system:admin` user which is generated by the `openshift-install` program in Passbolt for emergency access to clusters.
The client certificate generated for that kubeconfig has a lifetime of 10 years.
Unfortunately, Kubernetes doesn't support revoking client certificates, see https://github.com/kubernetes/kubernetes/issues/18982[this GitHub issue (kubernetes/kubernetes#18982)].

We would like to have another form of emergency access to OpenShift 4 clusters.
The main reason is that having credentials with a lifetime of 10 years which can't be revoked is less than ideal.

=== Goals

* Define a method to manage emergency access credentials for OpenShift 4 clusters
* The credentials should be relatively short-lived and it must be possible to rotate them

=== Non-Goals

* Replace regular authentication

== Proposals

While writing out the proposals, we identified that any solution to manage admin credentials is composed from two largely independent choices:
First, there's multiple credential types (client certificates or service account tokens) which can be used for the admin credentials.
Second, there's multiple possible implementations for managing the admin credentials on each cluster.

=== Credential type
chloesoe marked this conversation as resolved.
Show resolved Hide resolved

In this section, we briefly outline the possible credential types that we can use for the admin credentials.

==== Issue short-lived certificates with cluster-admin privileges

The first approach is that we issue client certificates with cluster-admin privileges.
This can be done either through Kubernetes' `CertificateSigningRequest` (CSR) resources, or by manually issuing certificates against a self-signed CA certificate which is installed as a client CA certificate in the cluster.

One point to consider is that Kubernetes doesn't support issuing client certificates for group `system:masters` through CSR signer `kubernetes.io/kube-apiserver-client`.
However, group `system:cluster-admins` is allowed, and functionally equivalent on OpenShift 4.

Note that we can't revoke certificates issued through CSRs or through a self-managed CA certificate.
However, if we use a self-managed CA certificate, we can invalidate any existing certificates by rotating the CA and issuing a new certificate from the new CA.

==== Use service account tokens with cluster-admin privileges
[#sa_tokens]

The second approach is that we setup a Kubernetes service account which is granted `cluster-admin` privileges through a `ClusterRoleBinding` and issue service account tokens for that service account.

We've got two options to generate tokens for service accounts:

. The https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#manually-create-an-api-token-for-a-serviceaccount[TokenRequest] API allows us to generate service account tokens which expire after a defined amount of time.
However, tokens which are manually created through the TokenRequest API (for example with `kubectl create token`) can't be invalidated before they expire.

. [Non-expiring API tokens] are created by defining a secret of type `kubernetes.io/service-account-token`.
As the name suggests, these tokens don't expire.

The only way to permanently invalidate service account tokens (both non-expiring and time-bound) is to delete the service account.
Creating a new service account with the same name in the same namespace doesn't reactivate tokens associated with a previous service account, since the tokens contain the service account's Kubernetes resource UID.

The proposed approach for using service account tokens is to use the TokenRequest API to create short-lived API tokens to generate expiring admin credentials by default.
Additionally, introduce a mechanism to force the tool to recreate the service account to invalidate any old tokens that might have leaked.
That mechanism might be as simple as having the tool reconcile the service account and recreate it if it gets deleted.

=== Credential management
chloesoe marked this conversation as resolved.
Show resolved Hide resolved

In this section, we outline some possible approaches for managing the admin credentials on each cluster.

==== Extend Steward to manage credentials and write them to Vault

We can extend https://syn.tools/steward[Steward] to manage and renew the credentials and store them in Vault.

This allows us to issue relatively short-lived credentials (on the order of days), which limits the attack surface presented by engineers accessing admin credentials in emergency situations.

Optionally, we can also extend Steward to render a full kubeconfig file based on the managed credentials and store that file in Vault in addition to the raw credentials.
If we store a full kubeconfig file in Vault, we can document a single `vault` CLI command which fetches the emergency kubeconfig for a cluster.

==== Create a new custom controller which manages the credentials on the cluster and writes them to an external secrets store
[#custom_controller]

Instead of extending Steward, we could also create a new controller which manages admin credentials and writes them to an external secrets store.
This would provide some level of separation of concerns, since managing admin credentials isn't necessarily part of the Project Syn bootstrap process.
Additionally, having a separate tool allows us to have releases independent of the fairly complex Steward release process.
Finally, this gives us some freedom, as we're more decoupled from Project Syn and don't necessarily need to write the credentials to the Project Syn Vault.

==== Manage credentials by hand

Another approach is to manage and renew the admin credentials by hand.

== Decision

Use <<sa_tokens,service account tokens generated through the TokenRequest API>> and implement a <<custom_controller,custom controller>> to manage the service account, cluster role binding and tokens.

== Rationale

We've decided to use service account tokens generated through the TokenRequest API, since that's the approach which needs the least amount of custom work.
By using service account tokens, we've got a simple mechanism to revoke old access credentials (delete the service account).
Additionally, we don't need to manage a custom CA with this approach.

We've decided to implement a custom controller over extending Steward's functionality for multiple reasons:

* By implementing a separate controller, we aren't bound to Steward's release process to implement and improve the admin certificate management.
* A separate controller can be tested and developed in isolation without having to worry about a locally executed Steward breaking a cluster's Project Syn setup.
* A tool like this may be useful outside Project Syn
* By not closely coupling this tool with Steward, we keep our options open in regard to where we save the credentials (Vault or Passbolt).
If this would be integrated with Steward, it would be almost mandatory to save the credentials in the Project Syn Vault.

Finally, by storing the credentials in an external service (such as Passbolt), we ensure that we don't store the emergency access credentials on the system itself (for the cluster which hosts the Project Syn Vault).

== References

* https://access.redhat.com/solutions/4845381[Red Hat solution which gives some details on the admin kubeconfig]
* https://access.redhat.com/solutions/6054981[Red Hat solution describing how to replace the CA for the initial admin kubeconfig]
1 change: 1 addition & 0 deletions docs/modules/ROOT/partials/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -225,3 +225,4 @@
** xref:oc4:ROOT:explanations/decisions/shipping-metrics-to-centralized-instance.adoc[]
** xref:oc4:ROOT:explanations/decisions/scheduled-mr-merges.adoc[]
** xref:oc4:ROOT:explanations/decisions/subscription-tracking.adoc[]
** xref:oc4:ROOT:explanations/decisions/admin-kubeconfig.adoc[]