Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kuadrant CR policy support proposal #6

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
164 changes: 164 additions & 0 deletions rfcs/0000-kuadrant_crd.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
# RFC Template

- Feature Name: Control Policy management at the Kuadrant CR level
- Start Date: 2022-11-15
- RFC PR: [Kuadrant/architecture#0000](https://github.com/Kuadrant/architecture/pull/0000)
- Issue tracking: [Kuadrant/architecture#0000](https://github.com/Kuadrant/architecture/issues/0000)

# Summary
[summary]: #summary

Not every use cases requires, or even would want to permit usage of all of Kuadrant's capabilities to
be enabled on the target cluster. This proposal aims at giving users control, through the `kuadrant`
custom resource, of what gets enabled on the cluster when deploying Kuadrant.

# Motivation
[motivation]: #motivation

The rational behind this proposal is that not every deployment requires all of Kuadrant features to be
supported; currently `RateLimitPolicy` and `AuthPolicy`. There is no reason to impose the workload to
run on the cluster if it isn't need, but there might be more reasons why explicitly disabling one
feature might be desirable: disabling the users to actually apply one or the other policy to their
traffic. Authorization for instance could always be implemented at the application level as per the
organization desire, while rate limiting is achieved through `RateLimitPolicy` attached to the network
resources. Just as the opposite situation might be applicable to an organization that wants to do
authn/z by leveraging Kuadrant, but have no use cases for rate limiting.

This proposal aims not only at defining a way to disable one of these features at deploy time, but
also clarify what the behavior would be when such a state transitions while some policy are already
existent; what the user experience would be when trying to apply a policy that's been explicitly
disabled by the cluster administrator when Kuadrant was installed.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

When applying the `Kuadrant` resource to the cluster for the first time, you have the option to
disable the support for a policy by specifying the following in the `spec` of the resource:

```yaml
authentication:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
authentication:
auth:

so it leaves room for 'authorization' in the meaning of the flag as well.

policy: disabled
rateLimiting:
policy: disabled
```

The `policy` key of either of the service, `authentication` and `rateLimiting`, defaults to `enabled`.
That value can be specified as `disabled`. The behavior on both services is consistent: it disables
the support for the policy backed by the service, i.e. `AuthPolicy` and `RateLimitPolicy`
respectively.

When disabling `RateLimitPolicy`, Kuadrant will not create the `Limitador` object used by the
`limitador-operator`. Disabling the `AuthPolicy` yields the same result for the `Authorino` object.
Additionally it will avoid registering the [external
authorizer](https://istio.io/latest/docs/tasks/security/authorization/authz-custom/) with Istio. The
both policies API will be exposed and the reconcilers will still be registered. Creating a policy that
is disabled would be denied.

A disabled policy can be enabled by changing the value of the `policy` key. Flipping the value can
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would we enforce this?

only happen when no policy object exist, e.g. in order to disable the `authentication`, no
`AuthPolicy` can exist or this would fail.

While no default `Limitador` or `Authorino` object would be created, a user can absolutely create one
maleck13 marked this conversation as resolved.
Show resolved Hide resolved
or the other and use their APIs directly. In the case of authentication, the user would need to
register the external authorizer with Istio themselves.

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

Changing the definition of the `kuadrant` custom resource is relatively straight forward. It is
intended that the way to enable or disable the support for either policy is not a boolean value. Using
an enumeration, i.e. `disabled` or `enabled`, enables growing the set of values later.

## Installation

On initial installation, honoring one or the other setting would be fairly straight forward as well:

The both CRDs, as well as their reconcilers all get registered whether the `policy` support is
`enabled` or `disabled`.

- `enabled` The Kuadrant controller makes sure the external authorizer gets registered with Istio and
makes sure there is an `Authorino` and/or `Limitador` object without any changes as it does today.
- `disabled` None of the above would happen. Most importantly, any missing part required for the above
shouldn't have the reconciliation fail and prevent further work it requires to do (as reconciling
another policy that's being `enabled`).

Additionally a `ValidatingWebhookConfiguration` gets registered for each CR's `CREATE` operation for
the entire cluster. The webhook will only allow admissions of resources for which the policy is
currently enabled. `UPDATE` and `DELETE` should remain unhindered, as in order to transition the
policy support requires no policies to be present.

If any of the following steps fails, all previously applied changes need to be unrolled and the
`status` block of the CR would clearly reflected what changed and that Kuadrant is currently _not_
operational. This is to be considered an atomic operation, either all gets successfully deployed or
nothing is.

If a policy is disabled but a policy is being added, when disallowing the admission, the webhook would
use a response code `501 Not Implemented` with the additional `message` indicating the policy is
disabled and as such can't be honored.

### In an OSSM cluster

The deployment of `Kuadrant` should fail if `authentication` isn't explicitly disabled.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make this explicit in this proposal? I assume we will want it to succeed in the future?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this is required after Kuadrant/kuadrant-operator#111

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eguzki's right. This section can now be removed from the RFC.


## State transitions of policy support

If the `Kuadrant` object gets updated to change the support of a policy, the change would only be
considered if no such policy exists in the cluster. No state transitions with regards to policy
support is possible if policies of the given type are present.

While it would be good to be able to enable the support for a policy while such policies already
exist, there is much to consider as to what this would entail - here are a few things to consider:

- can we apply all these policies?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we have this issue anyway? If I remove all the policies and recreate post enabling we will need to be able to reconcile them

- what ordering should be considered to apply them?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure the controller should be concerned about the ordering, unless I have missed something. Is it about conflict? To resolve a conflict do we not need to fall back to the oldest policy wins anyway from policy attachment?

- what would a conflict mean?
- what would disabling `authentication` for a cluster, while existing routes are currently protected by these?

Requiring to first remove all policies and then only safely disable or enable the support for that
kind of policy seems reasonable at this stage and can always be added later, which clearly specified
Copy link
Collaborator

@maleck13 maleck13 Nov 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree so the questions above may still be valid for discussion but I think as a starting point this is simple set of states to deal with initially but also agree that we shouldn't rule this out in the future as likely the flow a user will follow here is kubectl delete -f AuthPolicy.yaml enable auth support kubectl create -f AuthPolicy.yaml which from an end user perspective will feel entirely redundant.

behavior.

### Disabling the support for a policy

When disabling the support for a policy, merely the admission webhook's behavior changes. The
`Authorino` resource or `Limitador` ones remain untouched.

# Drawbacks
[drawbacks]: #drawbacks

The flipside of the flexibility this provides users with, this introduces quite some complexity. Hence
the desire to keep this very rigid as to what are the preconditions to change the support for a
policy.

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

Possibly the gain of not running unnecessary services on the cluster isn't big enough for us to
justify adding this complexity. Yet, as of today, we need a way to disable `authentication` in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case of running within an OSSM backed cluster.

We could disable `authentication` on a heuristic base approach: running within OSSM? Disable
`authentication`. The user experience seems less than desirable and confusing as best, as in future
version this would probably change. That being said we're still pre-v1. We'd also need to find a way
to know very early on what environment we are being deployed in.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. So this for me comes to the fundamental question. Do we want to support these different system states as feature or do we want to say in v0.x.y authpolicy is not supported in a particular environment add the webhook mentioned earlier during installation and reject any AuthPolicies in that environment.
It also raises an interesting question for me that if we want to offer individual features that perhaps RateLimitPolicy and AuthPolicy should become a part of those components (IE Limitador/Limitador operator and Authorino/Authorino operator) should support AuthPolicy and RateLimitPolicy respectively. Have we given that approach any consideration? This would mean enabling in the kuadrant operator would install or uninstall the operator and in turn would install or remove the CRD, if memory serves removing the CRD will remove any of those resource types also. Doing this way would mean that we would not need the webhook as the CRD is either their or not.
cc @eguzki @guicassolato

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not follow the approach of adding/removing CRDs dynamically based on some Kuadrant CR. CRD's are global cluster objects and that raises potential conflicts. What if some user in the cluster is using the Limitador operator to limit their service by their own (not using kuadrant)? Or how does this idea fit when kuadrant supports multiple kuadrant data plane instances?

Regarding having AuthPolicy part of the Authorino Operator (or RateLimitPolicy part of the Limitador Operator) is interesting to consider. I do not see now a hard stopper.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see Kuadrant's RLP and KAP as definitions of Limitador Operator and Authorino Operator respectively. These CRDs of Kuadrant are committed to GW-API's policy attachment value proposition, which the aforementioned operators aren't. In fact, I'd see such as a violation of the current dependency that exists between Kuadrant and its functional components, where the lower dependencies (Limitador and Authorino) currently can exist without knowing anything about Kuadrant or its purpose.

Furthermore and especially within the scope of this particular RFC, I agree with @eguzki's point of CRDs being global definitions in the cluster and therefore their installation not the way to control whether a particular workload in a particular namespace shall be deployed or not.

For both of these reasons, I believe that installing the CRD is one thing, while choosing which functional component to deploy is another separate one. Therefore, RLP and KAP CRDs should continue living in the Kuadrant Operator IMO.


# Prior art
[prior-art]: #prior-art

While I couldn't find, probably because of lacking knowledge about similar use-cases, anything that
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in general not enabling an API in a cluster or disabling an API is done by making the CRD available or deleting it

contradicts what this approach proposes, I couldn't see how what this would entail goes against any
Kubernetes best practices for Operators.

# Unresolved questions
[unresolved-questions]: #unresolved-questions

- Is this really what we want?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the key question.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want this. However we might want other features before this one to make users/consumers adopt Kuadrant. I know I do not know :)

- Can we get authentication supported in OSSM and possibly pun on this a little longer?
- Should we have a deployment that skips authentication entirely when in an OSSM cluster?

# Future possibilities
[future-possibilities]: #future-possibilities

- relaxing the requirements for state transitions on policy support
- support some kind of lazy support for policy, where limitador and/or authorino are only lazily
deployed to the cluster.