Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Azure authentication refactoring design #1940

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
187 changes: 187 additions & 0 deletions docs/design/Refactor Azure authentication.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
# **Azure Authentication Refactoring in Ratify**

## **Introduction**
Authentication is a critical process in Ratify, ensuring secure access to artifatcs in container registries, and to keys, secrets and certificates from cloud key vaults, and other resources. Azure offers two primary SDKs for authentication in Go:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Authentication is a critical process in Ratify, ensuring secure access to artifatcs in container registries, and to keys, secrets and certificates from cloud key vaults, and other resources. Azure offers two primary SDKs for authentication in Go:
Authentication is a critical process in Ratify, ensuring secure access to artifacts in container registries, and to keys, secrets and certificates from cloud key vaults, and other resources. Azure offers two primary SDKs for authentication in Go:


- **Azure Identity ([azidentity](https://learn.microsoft.com/en-us/azure/developer/go/azure-sdk-authentication?tabs=bash))**: Designed for seamless integration with Azure services.
- **Microsoft Authentication Library ([MSAL](https://learn.microsoft.com/en-us/entra/identity-platform/msal-overview))**: Provides advanced token management capabilities.

Currently, Ratify uses both SDKs across different components, leading to complexity and maintenance overhead. This document proposes a comprehensive refactoring of Azure authentication in Ratify to improve maintainability, reduce duplication, and streamline the user experience.

---

## **Existing Azure Authentication in Ratify**

### **ACR Token Retrieval**
Located in the **ORAS auth providers (`pkg/common/oras/authprovider/azure`)**:
1. **Azure Managed Identity (`azureidentity.go`)**:
- Uses `azidentity.NewManagedIdentityCredential` to retrieve an access token.
- Requires only the `clientID`:
```go
id := azidentity.ClientID(clientID)
opts := azidentity.ManagedIdentityCredentialOptions{ID: id}
cred, err := azidentity.NewManagedIdentityCredential(&opts)
```

2. **Azure Workload Identity (`azureworkloadidentity.go`)**:
- Uses `confidential.NewCredFromAssertionCallback` from the **MSAL** package.

### **Key Management Provider and Certificate Provider**
Both components recently replaced the deprecated `autorest` SDK with `azidentity` and now use workload identity credentials for authentication.

---

## **Challenges with the Current Design**

### 1. **Multiple SDKs**
Ratify employs both **`MSAL`** and **`azidentity`**, increasing the maintenance burden. Consolidating to a single SDK simplifies dependency management, reduces upgrade complexity, and enhances maintainability.

### 2. **Code Duplication**
Significant code duplication exists across components, particularly between Azure workload identity and managed identity implementations. Consolidating shared logic improves maintainability.

### 3. **Explicit Authentication Selection**
susanshi marked this conversation as resolved.
Show resolved Hide resolved
Currently, users must explicitly specify the authentication type. In well-defined environments like Azure Kubernetes Service (AKS), this should be inferred automatically based on environment variables.

---

## **Proposed Refactoring**

### **Goals**
1. Design a **common package** for Azure authentication logic.
2. **Infer authentication type** automatically based on the environment, reducing user configuration overhead.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great add for almost all scenarios but there are scenarios where an override from the uesr to specify exactly the cred type might be required. Notation CLI encountered this too. We should consider exposing override capability which will not use the chained credential.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. As you suggested, we can provide this ability by accepting the override from user input.

3. **Unify implementations** for workload identity and managed identity in ORAS auth providers.
4. Implement a **chained authentication process**:
akashsinghal marked this conversation as resolved.
Show resolved Hide resolved
- Workload Identity → Managed Identity → Azure CLI.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of Azure CLI, does that mean Ratify CLI will also support auth to Azure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, one of the goals of this work is to support Azure authentication in the CLI scenario and the azidentity SDK seems to be able to facilitate this through a number of credential types like AzureCLICredential, AzureDeveloperCLICredential, DefaultAzureCredential, and ChainedTokenCredential.
ChainedTokenCredential seems to be the right choice for ratify to consolidate all scenarios in one single place.

5. Use a single SDK (**`azidentity`**) for all authentication workflows to improve maintainability and alignment with Azure best practices.

---

### **Refactoring Plan**

#### **1. Introduce a New Azure Authentication Package**
- A new package, `pkg/common/cloudauthproviders/azure`, will consolidate shared Azure authentication logic.
- Authentication will use `ChainedTokenCredential` to sequentially try:
- **Workload Identity**
- **Managed Identity**
- **Azure CLI**
- If all attempts fail, the process will return an error.

##### **Proposed Code Snippet**
```go
package azure

import (
"fmt"
"os"

"github.com/Azure/azure-sdk-for-go/sdk/azidentity"
)

func NewChainedCredential() (*azidentity.ChainedTokenCredential, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I assume this proposed implementation is going to change now to take into account different client ids not specified via env variables?

var creds []azidentity.TokenCredential

// Add Workload Identity if environment variables are set
if tenantID := os.Getenv("AZURE_TENANT_ID"); tenantID != "" {
if clientID := os.Getenv("AZURE_CLIENT_ID"); clientID != "" {
if tokenFile := os.Getenv("AZURE_FEDERATED_TOKEN_FILE"); tokenFile != "" {
wiCred, err := azidentity.NewWorkloadIdentityCredential(&azidentity.WorkloadIdentityCredentialOptions{
TenantID: tenantID,
ClientID: clientID,
TokenFilePath: tokenFile,
})
if err == nil {
creds = append(creds, wiCred)
}
}
}
}

// Add Managed Identity
if clientID := os.Getenv("AZURE_CLIENT_ID"); clientID != "" {
miCred, err := azidentity.NewManagedIdentityCredential(&azidentity.ManagedIdentityCredentialOptions{
ID: azidentity.ClientID(clientID),
})
if err == nil {
creds = append(creds, miCred)
}
}

// Add Azure CLI Credential
cliCred, err := azidentity.NewAzureCLICredential(nil)
if err == nil {
creds = append(creds, cliCred)
}

if len(creds) == 0 {
return nil, fmt.Errorf("no valid credentials detected. Check environment configuration.")
}

// Combine credentials into a chain
return azidentity.NewChainedTokenCredential(creds, nil)
}
```

In the code sample above, the chained token credential will try to authenticate using workload identity first, then managed identity will be attempeted, and then the CLI authentication will be attempted. If any of the attempts succeed at any stage, it will return the corresponding credential.

There is another option that can be used which is the default azure credential: `azidentity.NewDefaultAzureCredential`. It's an opinionated, preconfigured chain of credentials and is designed to support many environments, along with the most common authentication flows and developer tools. In graphical form, the underlying chain looks like this:
![image](../img/AzureAuthRefactor/image.png)

Howecer, this option is not recommended for the following reasons:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Howecer, this option is not recommended for the following reasons:
However, this option is not recommended for the following reasons:

1. Debugging challenges: When authentication fails, it can be challenging to debug and identify the offending credential. You must enable logging to see the progression from one credential to the next and the success/failure status of each. In contrast, [debugging a chained credential](https://www.rfc-editor.org/rfc/rfc3280#section-1) is relatively easy.
2. Unpredictable behavior: DefaultAzureCredential checks for the presence of certain environment variables. It's possible that someone could add or modify these environment variables at the system level on the host machine. Those changes apply globally and therefore alter the behavior of DefaultAzureCredential at runtime in any app running on that machine.
3. Ability to provide required parameters: When using the default azure credential option, we can only rely on the environment variables, meaning that we cannot provide the client_id, tenant_id, or any other parameter explicitly. This is particularly problematic when Ratify is provided with multiple auth providers. With the chanined token credentials, each credential type in the chain can be provided with the required parameters explicitly if needed.

#### **2. Refactor ORAS Auth Providers**
- Combine `azureidentity.go` and `azureworkloadidentity.go` into a single file.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will maintain backwards compatability and ensure no breaking changes? We'll need to ensure we can support existing workload identity managed identity providers when user specifies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can ensure this by providing the override ability, as you pointed above.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we will introduce a new auth provider or just refactor existing one and add new fields necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can combine the existing two Oras auth providers into one auth provider. If chained authentication is used, there is no need to have both. We can override the chained credential process based on the user input, to explicitly use workload or managed identity.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are keeping the existing authProviders, and introducing new azure auth provider. Should we just keep current implementation as is ( reduce risk , and introduce a new implementation/new file for the new auth provider. This is simliar to how we deprecated CertProvider and introduce KMP CR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

- Update the implementation to use the `pkg/common/cloudauthproviders/azure` package for authentication.
- Authentication type will be inferred based on environment variables.

#### **3. Refactor Key Management and Certificate Providers**
- Update the providers to leverage the new `pkg/common/cloudauthproviders/azure` package for authentication.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a user perspective, will there be any change in how the credentials are configured? Is KMP AKV setup with client id etc. decoupled still from ORAS azure auth providers?

Copy link
Contributor Author

@shahramk64 shahramk64 Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. This requires more thinking. There are a few alternatives here:
1- Decoupled scenario: both can provide their own configurations. I wonder how the chained credential should work in this case. For example if both are defining a client id variable, which one is set in the ENV variable that the chained credential uses.
2- Coupled scenario: the configuration that is common to both is extracted and placed into a separate resource to represent both. This means that both will use the same credential type and the same credential config, (unless overridden explicitly?)
I think a decision needs to be made whether to support different types of credentials for Oras and KMP at the same time or not (for example, workload identity for KMP and managed identity for Oras), and also when using the same credential type for both, to support different identities for them (for example, a different client id for Oras and KMP)

- Remove redundant logic and ensure consistent authentication processes across all providers.

#### **4. Configuration change**
- Introduce a new generic auth provider: `azure`
. A sample Oras store config has a section for auth provider that looks like this:
```
authProvider:
name: azureWorkloadIdentity
clientID: XYZ
```
We will introduce a new authProvider: `name: azure`. This will let Ratify know that it should use the chained token credential. We can provide the required parameters explicitly like `clientId` and `tenantID` as well, and this will let the chain token credential know that it will need to use these parameters instead of the environment variables. For backward compatibility, we will also provide the ability to override the chained token credential implementation by specifying additional attribute named `credential` which can be set to either `managedIdentity`, `workloadIdentity`, or `cli`. This will trigger the specified credential option instead of the chanined token credential.


---

### **Advantages of the Proposed Refactoring**

1. **Improved Maintainability**:
- A single SDK (`azidentity`) reduces dependencies and simplifies code management.
- Consolidated authentication logic minimizes duplication and enhances clarity.

2. **Enhanced User Experience**:
- Automatic detection of authentication type eliminates the need for explicit configuration in most environments.
Copy link
Collaborator

@FeynmanZhou FeynmanZhou Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please elaborate on which auth configuration will be simplified after refactoring? Will Ratify detect whether users use a Azure Workload Identity or Azure Managed Identity? Maybe we could reference this doc to clarify which configuration could be removed https://ratify.dev/docs/quickstarts/ratify-on-azure#create-a-custom-resource-for-accessing-acr


3. **Extensibility**:
- Centralized authentication logic makes it easier to extend support for new scenarios or credential types in the future.

4. **Alignment with Azure Best Practices**:
- `azidentity` provides a Kubernetes-native experience, integrating seamlessly with other Azure SDKs.

---

### **Proposed Tasks**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we plan to deprecate any existing auth provider/config, we should add this to the V2 tracking issue that @binbin-li made.


1. **Create the New Azure Authentication Package**:
- Implement shared authentication logic using `azidentity` and `ChainedTokenCredential`.

2. **Refactor ORAS Auth Providers**:
- Combine `azureidentity.go` and `azureworkloadidentity.go`.
- Use the new package for authentication.

3. **Refactor Key Management and Certificate Providers**:
- Update the providers to leverage the common Azure authentication package.

4. **Test and Validate**:
- Thoroughly test the refactored components across different environments (e.g., AKS, local development) to ensure correctness and reliability.

Binary file added docs/img/AzureAuthRefactor/image.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading