Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] - Project isolation #5204

Closed
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
110 changes: 110 additions & 0 deletions rfc/system/5204-project-isolation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# Project isolation

**Authors:**

- @robert-ulbrich-mercedes-benz


## 1 Executive Summary

This feature enables admins of Flyte to assign users of Flyte only to certain projects. That way users can only access and submit workflows to Flyte projects they are assigned to. This is a security feature to prevent unauthorized access to projects and their data. This RFC proposes a mechanism to isolate projects from each other.

For the feature to work, authentication needs to be enabled. Flyte will evaluate the identity token issued by the IDP and extract the values of a configured claim. The claim values are mapped to projects in Flyte. If the claim value matches a project, the user is allowed to access the project. If the claim value does not match any project, the user is denied access.

## 2 Motivation

Having Flyte projects isolated from each other is a security feature to prevent unauthorized access to projects and their data. This is especially important when Flyte is used in a multi-tenant environment. This feature is also required to comply with data protection regulations. Especially in enterprise contexts with many different teams each having their own Flyte project, isolation is essential.

## 3 Proposed Implementation

The Flyte operators can enable or disable the project isolation feature. When enabled, Flyte will evaluate the identity token issued by the IDP and extract the values of a configured claim. The user can then
access a project if the claim value matches the project. If the project isolation feature is enabled and the claim value does not match any project, the user is denied access to all projects. So it is a whitelist mechanism.

The only external interfaces of Flyte is the API that is being used by the Flyte console, the flytectl cli and pyflyte. So it is sufficient to add a check if the user interacting with the API for a certain project has the required permission.

Since the API endpoints are very different from one another, the check needs to be implemented in each endpoint. Because of the different formats of the endpoint holding the project it will be hard to define a middleware centrally, which performs the check.

The mapping between claim value and project to authorize can be configured in the following way and put in the Helm chart of the Flyte deployment:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this work for dynamically registered projects via the CLI? is there a catch-all or default mapping or do deployments now need to be updated when a new project is registered?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For newly registered projects per RFC we will currently need to adapt the configuration in order to map the project to a certain claim value. We can also introduce a default mapping - if you think it is a good idea, I will adapt the RFC

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of a default in that it reduces the friction for creating a new project, but then I guess it breaks from your proposed paradigm of explicitly white-listing access to projects.

My only concern is that since all it currently takes to create a project is a simple api call that any user can make, the change as it's currently proposed introduces more friction for self-service project registration (which is maybe undesirable in a context where you're already managing per-project access!)

I'm curious, in practice what has the overhead been like for your organization when it comes to provisioning new projects? Is that an infrequent occurrence?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Flyte instance in our organization is centrally managed. So it is not desired to have users creating projects on their own. In our organization project creation is put in the hands of the team managing the Flyte instance. So the Flyte projects we have are well defined and there are not too many projects being added or removed.

In companies I assume the Flyte instances will usually be managed by a dedicated team.


```yaml
auth:
project-authorization:
enabled: true
claim: "entitlements"
mapping:
project1: "project1"
project2: "project2"
```

The identity token is evaluated by Flyte already and data can be accessed in the Flyte code. The following example golang code shows how the project isolation can be implemented in the Flyte code:


```go
package auth

import (
"context"
"github.com/flyteorg/flyte/flyteadmin/auth/config"
)

func ProjectAccessPermitted(ctx context.Context, project string) bool {
authConfig := config.GetConfig()

if authConfig.ProjectAuth.isEnabled() {
identityContext := IdentityContextFromContext(ctx)

claimValues := identityContext.UserInfo().GetAdditionalClaims().GetFields()[authConfig.ProjectAuth.Claim].GetListValue().GetValues()

for claimValue := range claimValues {
if claimValue.GetStringValue() == authConfig.ProjectAuth.mapping[project] {
return true
}
}

return false
} else {
return true
}

}
```

The following piece of code also shows how the check is integrated into existing API endpoints:

```go
func (m *AdminService) GetWorkflow(ctx context.Context, request *admin.ObjectGetRequest) (*admin.Workflow, error) {
defer m.interceptPanic(ctx, request)

if request == nil {
return nil, status.Errorf(codes.InvalidArgument, "Incorrect request, nil requests not allowed")
}

if ProjectAccessPermitted(ctx, request.GetId().Project) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an opt-in mechanism, right? how do you ensure that new endpoints also get the same treatment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is an opt-in mechanism. We have meanwhile implemented it in a private fork of the project. We moved away from changing every single endpoint, but added a central middleware. But since there are no common fields for all of the request objects, actually each and every endpoint and its request object need to be explicitly mentioned, which I consider a weakness because if newly added endpoints are forgotten to be added in that middleware, then those endpoints will not benefit from the project isolation. Unfortunately we could not find a better solution to this issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you like to, I can also show you how we actually implemented it in the fork.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robert-ulbrich-mercedes-benz that all makes sense, I can see where maybe the middleware rejects unhandled endpoints by default but then that still requires manual code changes per new endpoint (and for newer or infrequent endpoints, it may not be obvious that the authorization check was missed)

How do you keep up with new service definition changes in your fork? Or is the process manual currently?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We manually went over the GRPC service definitions that are available. We found out, that only Admin Services are of concern. But I share your concerns regarding new endpoints that might be missed in that central middleware.

return nil, status.Errorf(codes.PermissionDenied, "Access denied for project %s.", request.GetId().Project)
}
```


## 4 Metrics & Dashboards

There are no metrics or dashboards needed for this feature.

## 5 Drawbacks

The feature needs to be thoughtfully be implemented and tested in order to avoid introducing security vulnerabilities which allows a user to access Flyte's project she is not meant to see information about.

## 6 Alternatives

In order to provide tenant isolation it is also possible to have multiple Flyte deployments. This is a more complex setup and requires more resources. It is also more difficult to manage. It will also only be feasible if the number of different projects is relatively small.

## 7 Potential Impact and Dependencies

Existing Flyte deployments will not be affected by this feature. The feature can be disabled by default to not impact existing Flyte deployments when updating to a Flyte version with the feature. To use this feature, Flyte needs to have authentication enabled. The feature will not work without authentication with OIDC.

## 8 Unresolved questions

This feature will not introduce an RBAC concept within Flyte's single projects. So there will not be different roles within a project. Any person with access to a project can do whatever she deserves without restrictions but has no permissions for a project that she is not assigned to.

## 9 Conclusion

With the introduction of project isolation, tenants can be efficiently be isolated from each other. This is a security feature that is essential for enterprise use cases and multi-tenant environments. It is a feature that is required to comply with data protection regulations. The feature is easy to configure and does not require any changes to the Flyte API. It is a feature that can be enabled or disabled by the Flyte operators. The feature will not impact existing Flyte deployments. The feature will not introduce an RBAC concept within Flyte's single projects. So there will not be different roles within a project. Any person with access to a project can do whatever she deserves without restrictions but has no permissions for a project that she is not assigned to.
Loading