Extensibility: Decoupling Portable Resources RP from the Core #6325

rynowak · 2022-09-05T22:37:23Z

Summary

This issue discussed a proposal for decoupling the Portable Resources RP from core RP, and for how we implement extensibility for connectors. It's an explicit goal of this proposal that we define the extensibility mechanics for a connector, and then use this model for our implementation. This will also resolve some really bad coupling that we have in place right now between the two Radius RPs.

Status: This is a first draft, I'm looking for feedback and alternatives for us to debate.

Goals

Define contract at the RP layer to implement a connector type separately from the Radius project (3rd party extensibility to add a connector)
Define contract for a compute resource RP (like containers does) to interact with the connector contract separately from Radius project (3rd party extensibility to interact with connectors).
Define extensibility for connectors in a way that allows our RPs to implement their full feature-set while by interacting through a public contract
- No longer allowed to read each-other's databases or internal data-models.

Current State

The core RP has copied code from the connector RP so that it can read the internal data model of the connector RP. When a container is deployed that has connections, the core RP has to read that connection metadata from the connector RP's database using it's internal code. When a new connector type is added, new code has to be added to both core RP and connector RP - this needs to be resolved for 3rd party extensibility to function.

This is not intentional coupling, and was introduced when we split up the CustomRP implementation into two RPs. We didn't have time to complete the refactor, this post is describing solutions.

Requirements from a compute resource

When deploying a resource, a compute RP needs to interact with the resources named by 'connections' to determine the following:

What values (secret and non-secret) should be injected into the resource's definition (probably env-vars)
What IAM systems are in use, and what role assignments need to be created
(future) other metadata used for diagnostics and visualization experiences

Proposal

Connectors and routes should expose metadata needed for the connector contract as an ARM custom-action. That would look something like this.

POST /planes/radius/..../Applications.Connector/mongoDatabases/listConnectionMetadata

// This is just an example to describe the kinds of data that would be returned, it's not meant to be realistic
{
  iam: {
    azure: {
      roles: [ ... ]
    }
  }
  settings: {
    hostname: {
      value: "foo.bar.com",
      secret: false
    }
  }
}

The Core RP or (any other compute resource) is responsible for resolving the resource IDs of its connections, and calling into the connector RPs (via UCP proxy) to get the data needed for deployment.

For example the Core RP container support today wants to support:

Injecting environments variables
Setting up role assignments

The knowledge of which environment variables and role assignments is moved out of the CoreRP and into an extensible microservice contract.

Here's an example of what the data-flow would look like for a simple webapp + database scenario:

sequenceDiagram
    actor Client
    participant UCP
    participant Core RP
    participant Core RP Worker
    participant Connector RP

    Client->>UCP: PUT .../mongoDatabases/db
    UCP->>Connector RP: Proxy PUT .../mongoDatabases/db
    Connector RP->>UCP: 200 OK
    UCP->>Client: 200 OK

    Client->>UCP: PUT .../containers/todo
    UCP->>Core RP: Proxy PUT .../containers/todo
    Core RP->>+Core RP Worker: Queue Job
    Core RP->>UCP: 202 Accepted
    UCP->>Client: 202 Accepted

    par Client waits for completion
        loop Poll operation result
            Client-->>UCP: GET .../operationResults/{guid}
            UCP-->>Core RP: Proxy GET .../operationResults/{guid}
            Core RP-->>UCP: 202 Accepted
            UCP-->>Client: 202 Accepted
        end
    and Core RP deploys Container
        Core RP Worker->>UCP: POST .../mongoDatabases/db/listConnectionMetadata
        UCP->>Connector RP: Proxy POST .../mongoDatabases/db/listConnectionMetadata
        Connector RP->>UCP: 200 OK { ... }
        UCP->>Core RP Worker: 200 OK { ... }
        Core RP Worker-->>Core RP Worker: Deploy Container
        Core RP Worker-->>-Core RP: Complete Job
    end

    Client-->>UCP: GET .../operationResults/{guid}
    UCP-->>Core RP: Proxy GET .../operationResults/{guid}
    Core RP->>UCP: 200 OK
    UCP->>Client: 200 OK

Alternatives

Can we include the data we need as part of the connector resource and skip the custom action? This would prevent us from needing another separate contract, and would be part of the standard RPC.

Example:

FAQ / Open-Issues

Q: How would this pattern work for an Azure service? RP -> RP contracts and authentication are somewhat common in ARM. Since connectors are a concept in Radius, it makes sense for them to have a common set of capabilities.

Q: What can we do to make this scale better? Right now this makes 1 request per connection, which will not scale well if containers end up being highly connected. For example some containers in eShop have ~6 connections. If we need to make this scale better we can introduce batch operations that accept more than one resource for the same RP. This is already a common pattern in ARM - several of the contracts support a batch variant that reduces the number of requests.

Q: What else can we do to improve efficiency? Adding etag support to the to the connector contract could improve our overall efficiency for no-ops and allow us to support synchronous no-ops for containers. Right now we assume that any PUT to a container (or other resource) should trigger a deployment because we don't have a way to reason about whether connector data has changed. We should consider this part of the design.

Q: What EXACTLY is the schema for listConnectionMetadata? This is a first draft looking for input on the general idea. We'll continue to refine the details if we think this is the right overall idea.

Q: Do we need to permission secret and non-secret access to connection metadata separately? I'm not sure 😆 - looking for input on this. For now I'm assuming we want to assume a high level of trust for something accessing a connection's metadata.

Q: Do we need to permission secret and non-secret access to connection metadata separately? How do we know which types implement the connector protocol and which don't? For now we can use some heuristics, like whether a resource is a Radius type or not. Ultimately we need to implement a discovery API - a way to list the types - we already want this for the CLI so we can be loosely coupled. The discovery API and the registration for a connector type should include this information.

For now it's also fine to "ask for permission". The caller can invoke the custom action and handle the 404 if it's not implemented.

Q: How would we migrate towards this? First we'd define the contract, and start implementing inside connector RP. In parallel we can start consuming the new contract inside core RP with a fallback to the current logic.

AB#10775

The text was updated successfully, but these errors were encountered:

rynowak · 2022-09-05T22:42:21Z

/cc @youngbupark @kachawla - looking for feedback on this, or other ideas if you have them 👍

ytimocin · 2022-09-06T14:05:02Z

For efficiency, can we have an internal cache to hold the result of a listConnectionMetadata call response for some time?
Key: ["etag", "resourceID"] = Result of listConnectionMetadata for the given Resource. We might eliminate going to the DB for the same resource for a given period of time.

rynowak · 2022-09-06T16:41:07Z

Yes, caching would be possible if we support etags, however until we measure it's not clear how much benefit there would be. With an ETag based caching system you still have to make a request to the downstream service, so you're actually offloading the downstream not the upstream.

We still have to plan for the worst case behavior of cache misses.

youngbupark · 2022-09-06T16:49:35Z

It is better to add ARM or UCP component in sequence diagram. Ideally, any control plane operations must go through the ARM/UCP.

Can we include connection metadata as part of the response of GET resource operation instead of introducing new listConnectionMetadata ? Is there an issue to provide connection metadata as part of GET resource API response ?

youngbupark · 2022-09-06T17:05:48Z

Q: Do we need to permission secret and non-secret access to connection metadata separately? I'm not sure 😆 - looking for input on this. For now I'm assuming we want to assume a high level of trust for something accessing a connection's metadata.

Q: Do we need to permission secret and non-secret access to connection metadata separately? How do we know which types implement the connector protocol and which don't? For now we can use some heuristics, like whether a resource is a Radius type or not. Ultimately we need to implement a discovery API - a way to list the types - we already want this for the CLI so we can be loosely coupled. The discovery API and the registration for a connector type should include this information.

In ARM, service level role definition defines such permissions to access each operation via GET /{Namespace}/operations. Each RP must have its own 1P AAD App. When user registers RP, ARM creates 1P service principal for user subscription and assign the service role definition to the 1P service principal. The service role definition defines the several permissions such as listConnectionMetadata and listCredentials. I think we need to think about the similar role based access control for UCP in the long term.

jkotalik · 2022-09-06T17:35:28Z

Can we include connection metadata as part of the response of GET resource operation instead of introducing new listConnectionMetadata ? Is there an issue to provide connection metadata as part of GET resource API response ?

I'd be concerned about secrets, generally in ARM the list* APIs are used for obtaining secrets as they aren't part of the output. https://docs.microsoft.com/en-us/azure/azure-resource-manager/templates/template-functions-resource#list

youngbupark · 2022-09-06T17:39:24Z

Can we include connection metadata as part of the response of GET resource operation instead of introducing new listConnectionMetadata ? Is there an issue to provide connection metadata as part of GET resource API response ?

I'd be concerned about secrets, generally in ARM the list* APIs are used for obtaining secrets as they aren't part of the output. https://docs.microsoft.com/en-us/azure/azure-resource-manager/templates/template-functions-resource#list

As far as I know, listSecret/listCredentials should be remained as the separate API in the typical RP. I would not expect that listConnectionMetadata returns any secrets or credentials.

jkotalik · 2022-09-06T18:05:27Z

Kk, I didn't realize that a call to get secrets of connections would require two calls (listConnectionMetadata -> listSecrets). If we keep listConnectionMetadata, I don't see why that can't return secrets. We could rename listConnectionMetadata that implies "secret" if necessary.

rynowak · 2022-09-06T18:46:05Z

Can we include connection metadata as part of the response of GET resource operation instead of introducing new listConnectionMetadata ? Is there an issue to provide connection metadata as part of GET resource API response ?

This is a good idea. I'll put together an example and we can decide based on that.

kachawla · 2022-09-06T21:17:01Z

Overall problem and proposal makes sense to me except that we need a new custom operation (listConnectionMetadata) to get metadata. Is there a reason why we think a combination of existing APIs (GET and listSecrets) isn't sufficient for core RP to get the required metadata?

rynowak · 2022-09-06T21:28:50Z

Overall problem and proposal makes sense to me except that we need a new custom operation (listConnectionMetadata) to get metadata. Is there a reason why we think a combination of existing APIs (GET and listSecrets) isn't sufficient for core RP to get the required metadata?

This was what @youngbupark suggested also. Since we're all asking the same thing I'll put together an example so we have something realistic to think about.

rynowak · 2022-10-02T23:56:44Z

Coming back to this to try and get a consensus. It hearts my ❤️ to see our RPs so entangled with each-other 💔.

I've modified the Mongo example to match something that would work for this.

Option 1

This option adds a connectorData and connectorData.values property. The connectorData.values is defined as a map[string]string that includes the values to include when making a connection.

The CoreRP would look up connectorData.value as well as call listSecrets to get the full set.

{
    "id": "/.../providers/Applications.Connector/mongoDatabases/mongo0",
    "name": "mongo0",
    "type": "Applications.Connector/mongoDatabases",
    "properties": {
        "provisioningState": "Succeeded",
        "application": "/.../providers/Applications.Core/applications/testApplication",
        "environment": "/.../providers/Applications.Core/environments/env0",
        "resource": "/.../providers/Microsoft.DocumentDB/databaseAccounts/testAccount/mongodbDatabases/db",
        "database": "mongo0",
        "status": {
            // Output resources here
        },
        "connectorData": {
            "values": {
                "database": "mongo0"
            }
        }
    }
}

Why is database = mongo0 present in BOTH properties and properties.connectorData.values?

I assume this is the question you're all asking. The answer is that for connectors to be extensible, the core RP can know nothing about the schema of Applications.Connector/mongoDatabases. We have to provide the keys and the values in an unambiguous way.

Why connectorData.values?

We'll definitely keep adding functionality to connectors. We need the ability to add properties for other purposes without breaking the API.

Option 2

This option adds a connectorMetadata and connectorMetadata.valueProperties property. The connectorMetadata.valueProperties is defined as a []string that includes JSONPointer references to the properties that hold connector values.

The CoreRP would look up connectorMetadata.valueProperties, and the resolve the JSONPointer referrences as well as call listSecrets to get the full set.

{
    "id": "/.../providers/Applications.Connector/mongoDatabases/mongo0",
    "name": "mongo0",
    "type": "Applications.Connector/mongoDatabases",
    "properties": {
        "provisioningState": "Succeeded",
        "application": "/.../providers/Applications.Core/applications/testApplication",
        "environment": "/.../providers/Applications.Core/environments/env0",
        "resource": "/.../providers/Microsoft.DocumentDB/databaseAccounts/testAccount/mongodbDatabases/db",
        "database": "mongo0",
        "status": {
            // Output resources here
        },
        "connectorMetadata": {
            "valueProperties": [
              "/properties/database"
            ]
        }
    }
}

What's up with this wierd JSONPointer thing?

I assume this is the question you're all asking. The answer is that for connectors to be extensible, the core RP can know nothing about the schema of Applications.Connector/mongoDatabases. We have to provide the keys and the values in an unambiguous way.

This is a different technique for providing the keys that doesn't require duplicating data.

rynowak · 2022-10-02T23:56:55Z

/cc @youngbupark @kachawla

kachawla · 2022-10-03T16:30:57Z

How about a modification of option 1 that solves the problem of duplicated values by moving values to its own section in the schema -

{
    "id": "/.../providers/Applications.Connector/mongoDatabases/mongo0",
    "name": "mongo0",
    "type": "Applications.Connector/mongoDatabases",
    "properties": {
        "provisioningState": "Succeeded",
        "application": "/.../providers/Applications.Core/applications/testApplication",
        "environment": "/.../providers/Applications.Core/environments/env0",
        "resource": "/.../providers/Microsoft.DocumentDB/databaseAccounts/testAccount/mongodbDatabases/db",
        "status": {
            // Output resources here
        },
        "resourceMetadata": {
            "database": "mongo0"
        }
    }
}

All the generated values are associated to underlying resource instead to the connector directly, so we can define all the values that are populated during deployment in a separate section in the schema resourceMetadata (or another better name) - compute resource doesn't need to be schema aware here and should read everything in this section as resource values.

This will require updating schema of all connectors, but this is a one time change that we can afford to make at this point and will allow us to add more values while maintaining backwards compatibility in the future.

rynowak · 2022-10-03T16:43:58Z

The reason why I didn't choose this is that it obfuscates the experience for using our resources in Bicep. This trades-off the primary concern (what's the best user experience), for a secondary concern (how extensibility is implemented).

It also can't be as simple as just resourceMetadata because there will be other concepts in the future that are part of the connector contract. I'm sure of it 😁 So it would have to be something like resourceMetadata.values

Today looking up the database name is db.properties.database, it can't get much simpler than that - with that change it becomes db.properties.resourceMetadata.values.database. Worse, many of these properties are read/write, so users will have to set values in resourceMetadata.values.

I don't think that the duplication is a blocker, but it needs to be justified 😆

kachawla · 2022-10-03T17:56:05Z

I see. I think it is personal preference, I'd choose db.properties.resourceMetadata.database/db.properties.resource.database for bicep experience over duplicating these properties, which can be confusing to users on why we have duplicated values unless they read the documentation, and will add additional overhead on response payload which we can't go back from for every new property in the future once we go down this route.

nicolejms · 2023-12-14T19:52:43Z

done

nicolejms · 2023-12-14T19:53:27Z

done and duplicate

rynowak · 2023-12-14T20:21:58Z

Per discussion, the duplicate was in an internal tracking system not in Github. This isn't done :)

radius-triage-bot · 2024-01-04T20:50:36Z

👍 We've reviewed this issue and have agreed to add it to our backlog. Please subscribe to this issue for notifications, we'll provide updates when we pick it up.

We also welcome community contributions! If you would like to pick this item up sooner and submit a pull request, please visit our contribution guidelines and assign this to yourself by commenting "/assign" on this issue.

For more information on our triage process please visit our triage overview

willtsai transferred this issue from radius-project/radius Sep 19, 2023

AaronCrawfis transferred this issue from another repository Sep 19, 2023

rynowak mentioned this issue Oct 6, 2023

Add recipe controller #6438

Merged

nicolejms closed this as completed Dec 14, 2023

rynowak reopened this Dec 14, 2023

shalabhms added the triaged This issue has been reviewed and triaged label Jan 4, 2024

shalabhms changed the title ~~Extensibility: Decoupling Connectors from the Core~~ Extensibility: Decoupling Portable Resources RP from the Core Jan 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extensibility: Decoupling Portable Resources RP from the Core #6325

Extensibility: Decoupling Portable Resources RP from the Core #6325

rynowak commented Sep 5, 2022 •

edited by shalabhms

Loading

rynowak commented Sep 5, 2022

ytimocin commented Sep 6, 2022

rynowak commented Sep 6, 2022

youngbupark commented Sep 6, 2022

youngbupark commented Sep 6, 2022

jkotalik commented Sep 6, 2022

youngbupark commented Sep 6, 2022 •

edited

Loading

jkotalik commented Sep 6, 2022

rynowak commented Sep 6, 2022

kachawla commented Sep 6, 2022 •

edited

Loading

rynowak commented Sep 6, 2022 •

edited

Loading

rynowak commented Oct 2, 2022

rynowak commented Oct 2, 2022

kachawla commented Oct 3, 2022 •

edited

Loading

rynowak commented Oct 3, 2022 •

edited

Loading

kachawla commented Oct 3, 2022

nicolejms commented Dec 14, 2023

nicolejms commented Dec 14, 2023

rynowak commented Dec 14, 2023

radius-triage-bot bot commented Jan 4, 2024

Extensibility: Decoupling Portable Resources RP from the Core #6325

Extensibility: Decoupling Portable Resources RP from the Core #6325

Comments

rynowak commented Sep 5, 2022 • edited by shalabhms Loading

Summary

Goals

Current State

Requirements from a compute resource

Proposal

Alternatives

FAQ / Open-Issues

rynowak commented Sep 5, 2022

ytimocin commented Sep 6, 2022

rynowak commented Sep 6, 2022

youngbupark commented Sep 6, 2022

youngbupark commented Sep 6, 2022

jkotalik commented Sep 6, 2022

youngbupark commented Sep 6, 2022 • edited Loading

jkotalik commented Sep 6, 2022

rynowak commented Sep 6, 2022

kachawla commented Sep 6, 2022 • edited Loading

rynowak commented Sep 6, 2022 • edited Loading

rynowak commented Oct 2, 2022

Option 1

Option 2

rynowak commented Oct 2, 2022

kachawla commented Oct 3, 2022 • edited Loading

rynowak commented Oct 3, 2022 • edited Loading

kachawla commented Oct 3, 2022

nicolejms commented Dec 14, 2023

nicolejms commented Dec 14, 2023

rynowak commented Dec 14, 2023

radius-triage-bot bot commented Jan 4, 2024

rynowak commented Sep 5, 2022 •

edited by shalabhms

Loading

youngbupark commented Sep 6, 2022 •

edited

Loading

kachawla commented Sep 6, 2022 •

edited

Loading

rynowak commented Sep 6, 2022 •

edited

Loading

kachawla commented Oct 3, 2022 •

edited

Loading

rynowak commented Oct 3, 2022 •

edited

Loading