Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#5620] feat(fileset): Support credential vending for fileset catalog #5682

Merged
merged 11 commits into from
Dec 26, 2024

Conversation

FANNG1
Copy link
Contributor

@FANNG1 FANNG1 commented Nov 26, 2024

What changes were proposed in this pull request?

Support credential vending for fileset catalog

  1. add credential-providers properties for the fileset catalog, schema, and fileset.
  2. try to get credential-providers from the order of fileset, schema, and catalog.
  3. The user could set multi-credential providers

Why are the changes needed?

Fix: #5620

Does this PR introduce any user-facing change?

will add document after this PR is merged

How was this patch tested?

Add IT and test with local setup Gravitino server

@FANNG1 FANNG1 marked this pull request as draft November 26, 2024 13:42
@FANNG1 FANNG1 force-pushed the gravitino-credential branch 3 times, most recently from bacb98c to cf38676 Compare November 27, 2024 08:25
@FANNG1 FANNG1 force-pushed the gravitino-credential branch from 92a3167 to ba917f4 Compare November 28, 2024 08:32
jerryshao pushed a commit that referenced this pull request Dec 4, 2024
### What changes were proposed in this pull request?

This is the first PR of credential vending for Gravitino server to add
credential API , please refer to #5682 for the overall workflow.
Examples to use the API:

```java
    Catalog catalog = metalake.loadCatalog(catalogName);
    Credential[] credentials =
        catalog.supportsCredentials().getCredentials();

    Credential credential =
        catalog.supportsCredentials().getCredential(“s3-token”);
```

```java
    Fileset fileset = catalog.asFilesetCatalog().loadFileset(filesetIdent);
    Credential[] credentials =
        fileset.supportsCredentials().getCredentials();
    Credential credential =
       fileset.supportsCredentials().getCredential(“s3-token”);
 
```


### Why are the changes needed?

Fix: #5619 

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
using draft PR to do E2E test
jerryshao pushed a commit that referenced this pull request Dec 6, 2024
### What changes were proposed in this pull request?

This is the first PR of credential vending for Gravitino server to add
credential API , please refer to #5682 for the overall workflow.
Examples to use the API:

```java
    Catalog catalog = metalake.loadCatalog(catalogName);
    Credential[] credentials =
        catalog.supportsCredentials().getCredentials();

    Credential credential =
        catalog.supportsCredentials().getCredential(“s3-token”);
```

```java
    Fileset fileset = catalog.asFilesetCatalog().loadFileset(filesetIdent);
    Credential[] credentials =
        fileset.supportsCredentials().getCredentials();
    Credential credential =
       fileset.supportsCredentials().getCredential(“s3-token”);
 
```


### Why are the changes needed?

Fix: #5619 

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
using draft PR to do E2E test
@FANNG1 FANNG1 force-pushed the gravitino-credential branch 3 times, most recently from 6d361c5 to 995c130 Compare December 11, 2024 09:06
@FANNG1 FANNG1 force-pushed the gravitino-credential branch 3 times, most recently from 4594934 to c046392 Compare December 18, 2024 09:15
@FANNG1 FANNG1 changed the title [SIP] POC of gravitino credential vending [#5620] feat(fileset): Support credential vending for fileset catalog Dec 18, 2024
@FANNG1 FANNG1 marked this pull request as ready for review December 18, 2024 09:26
@FANNG1
Copy link
Contributor Author

FANNG1 commented Dec 18, 2024

@jerryshao @yuqi1129 this PR is ready now, please help to review, thanks


public static final Logger LOG = LoggerFactory.getLogger(SecureHadoopCatalogOperations.class);

private final HadoopCatalogOperations hadoopCatalogOperations;
private CatalogCredentialManager catalogCredentialManager;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we achieve this credential operations in the core module, like in operation dispatcher? It's a bit tricky to mix this with the catalog, WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean like tag operation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my feeling is that the credential may be orthogonal to the catalog implementation, we don't have to embed the code to each catalog implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me think more, what's different for Credential operations is the classloader, it should load the credential provider in catalog classloader.

return getCatalogCredentialContexts();
}
return getFilesetCredentialContexts(nameIdentifier, privilege);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we support credential in the schema level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The java client only supports Catalog and Fileset, but there is no extra check for the metadata object type in server side, I'll add it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean why don't we support credential in the schema level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The catalog credentials is mainly used for static credentials like AKSK, and the fileset credential is used for dymatic credentials like S3 tokens, is there any scenarios about schema credentials?

private Map<String, CredentialContext> getFilesetCredentialContexts(
NameIdentifier filesetIdentifier, CredentialPrivilege privilege) {
Fileset fileset = loadFileset(filesetIdentifier);
String location = fileset.storageLocation();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to check if the location and the provider is mapped? For example, what if the storage location is s3 and we don't have s3 credential provider?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to do the mapping especially for custom credential provider, how about let it throw a runtime exception?

@SuppressWarnings("UnusedVariable")
private CredentialPrivilege getCredentialPrivilege(String user, NameIdentifier identifier)
throws NotAuthorizedException {
return CredentialPrivilege.WRITE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we return write privilege here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, do we need a CredentialManager, or is the class name proper? Basically, my feeling is that it manages nothing, just a dispatcher.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Gravitino doesn't provide the feature to check which privileges are granted to a user to a specific fileset, we provide a default WRITE privilege to make the access to fileset works well.

/**
* Manages credentials in one catalog, including different credential providers, credential caching.
*/
public class CatalogCredentialManager {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is confusing we have a CredentialManager and CatalogCredentialManager, I would suggest we have a better name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd better figure out a good name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

return getCatalogCredentialContexts();
}
return getFilesetCredentialContexts(nameIdentifier, privilege);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean why don't we support credential in the schema level?

.filter(Objects::nonNull)
.collect(Collectors.toList());
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this interface should be leveraged by catalog implementor, we should put this into the connector package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

/**
* Manages credentials in one catalog, including different credential providers, credential caching.
*/
public class CatalogCredentialManager {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd better figure out a good name.

*
* @return An instance of {@link CatalogCredentialManager}.
*/
CatalogCredentialManager getCatalogCredentialManager();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interface design is not so good, why do we need to expose this class CatalogCredentialManager to the catalog level? I assume an API to get credential or credential context is enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed CatalogCredentialManager and add getCredentialProviders interface

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored

@FANNG1 FANNG1 force-pushed the gravitino-credential branch from 6c2d66f to 94fd8c0 Compare December 23, 2024 13:00
@FANNG1
Copy link
Contributor Author

FANNG1 commented Dec 23, 2024

@jerryshao , comments are addressed, please review again

*
* @return A map with of the credential provider type {@link CredentialProvider}.
*/
Map<String, CredentialProvider> getCatalogCredentialProviders();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the class CredentialContext and CredentialProvider should also move to connector package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved CredentialContext to connector package, not movingCredentialProvider for it is not exported to catalog developer in the latest implementation.

.filter(Objects::nonNull)
.collect(Collectors.toList());
}
}
Copy link
Contributor

@jerryshao jerryshao Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version looks better. I need to take a deep look.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the limitation of the interface, some common logic(like loading credential providers) is implemented in CredentialUtils, another way is to introduce a new class BaseCatalogOperations to enclosure the common logic, but I'm not sure whether it's a good idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we achieve this in the BaseCatalog, to mix this interface into BaseCatalog?

Copy link
Contributor Author

@FANNG1 FANNG1 Dec 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, seems a better solution, we could use catalog.supportsCredentials().getCredential(MetadataObject) to get the credential and the specific catalog only provide the getMetadataObjectProperties() getLocations() interface, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using PathBasedCredentialContextInfo to include both location and properites info to avoid load fileset multi times.

@FANNG1 FANNG1 force-pushed the gravitino-credential branch from bfb55b4 to 48ca294 Compare December 24, 2024 08:08
@FANNG1 FANNG1 marked this pull request as draft December 24, 2024 08:16
@FANNG1 FANNG1 force-pushed the gravitino-credential branch from 39504b2 to 4cc944e Compare December 24, 2024 09:19
@FANNG1 FANNG1 marked this pull request as ready for review December 24, 2024 09:21
@@ -78,6 +97,8 @@ public abstract class BaseCatalog<T extends BaseCatalog>

private volatile Map<String, String> properties;

private volatile CatalogCredentialManager catalogCredentialManager;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we should use a different name, the name of CatalogCredentialManager is too close to CredentialManager, and easily gets misled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about CatalogCredentialProvider or CatalogCredentialDispatcher?

public enum CredentialPrivilege {
READ,
WRITE,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add javadoc for class, and also add @DeveloperApi tag to all the classes in this package.

* @param schemaIdentifier The schema identifier.
* @return The schema properties.
*/
Map<String, String> schemaProperties(NameIdentifier schemaIdentifier);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need schema properties?

import org.slf4j.LoggerFactory;

/** Manages credentials in one catalog. */
public class CatalogCredentialManager implements Closeable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not proper to expose this class to the catalog implementors, catalog implementors don't need know the details of this class. If you want to expose something to let users to leverage, it is better to expose interface, not implementation.

* This interface defines how to get credentials, the catalog operation must implement this
* interface to support credential vending.
*/
public interface SupportsCredentialOperations {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need two SupportsXXX interfaces, one for BaseCatalog, one for CatalogOperations? It's hard to understand, also confuses the users.

@FANNG1 FANNG1 force-pushed the gravitino-credential branch 2 times, most recently from 98af70c to 84a23a8 Compare December 24, 2024 14:08
Comment on lines 296 to 293
public List<Credential> getCredentials(
NameIdentifier nameIdentifier, CredentialPrivilege privilege) {
Map<String, CredentialContext> contexts = getCredentialContexts(nameIdentifier, privilege);
return contexts.entrySet().stream()
.map(entry -> getCatalogCredentialManager().getCredential(entry.getKey(), entry.getValue()))
.filter(Objects::nonNull)
.collect(Collectors.toList());
}

private CatalogCredentialProvider getCatalogCredentialManager() {
if (catalogCredentialProvider == null) {
synchronized (this) {
if (catalogCredentialProvider == null) {
this.catalogCredentialProvider = new CatalogCredentialProvider(name(), properties());
}
}
}
return catalogCredentialProvider;
}

private Map<String, CredentialContext> getCredentialContexts(
NameIdentifier nameIdentifier, CredentialPrivilege privilege) {
if (nameIdentifier.equals(NameIdentifierUtil.getCatalogIdentifier(nameIdentifier))) {
return getCatalogCredentialContexts();
}

if (ops() instanceof SupportsPathBasedCredentials) {
List<PathWithCredentialType> pathWithCredentialTypes =
((SupportsPathBasedCredentials) ops()).getPathWithCredentialTypes(nameIdentifier);
return CredentialUtils.getPathBasedCredentialContexts(privilege, pathWithCredentialTypes);
}
throw new NotSupportedException(
String.format("Catalog %s doesn't support generate credentials", name()));
}

private Map<String, CredentialContext> getCatalogCredentialContexts() {
CatalogCredentialContext context =
new CatalogCredentialContext(PrincipalUtils.getCurrentUserName());
Set<String> providers = CredentialUtils.getCredentialProviders(() -> properties());
return providers.stream().collect(Collectors.toMap(provider -> provider, provider -> context));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this logic to CredentialManager? So from my understanding, each catalog can provide the list of PathWithCredentialType to the CredentialManager, credential manager can get credentials base on the provided list. We don't have to 1) introduce so manage concepts in the catalog level; 2) The credential management mechanism can all be achieved in one place. In the meantime, we don't have to introduce a concept of CatalogCredentialProvider to manage the credentials per catalog.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CredentialOperationDispatcher is used to use catalog specific resource to generate credentials.
CatalogCredentialOperationDispatcher manages the life of catalog credential provider and dispatch request to corresponding credential provider.

WDYT of the naming?

* The {@code PathWithCredentialType} class represents a combination of a path and its associated
* credential type.
*/
@Evolving
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to @DeveloperApi.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, What's the difference of DeveloperApi and Evolving, the other classes in connector package use Evolving

Comment on lines +87 to +88
* <p>If there are multiple properties suppliers, will try to get the credential providers in the
* input order.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does user guarantee the input order if he doesn't know the background, can we have a better solution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about rename to getCredentialProvidersByOrder or add order information to the arguments? like getCredentialProviders(firstPropertiesSupplier, secondPropertiesSupplier, thirdPropertiesSupplier)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using getCredentialProvidersByOrder

@FANNG1 FANNG1 force-pushed the gravitino-credential branch from 674f0d4 to 785664c Compare December 25, 2024 06:51
@FANNG1
Copy link
Contributor Author

FANNG1 commented Dec 25, 2024

@jerryshao comments addressed, please help to review again

@FANNG1
Copy link
Contributor Author

FANNG1 commented Dec 25, 2024

rename catalogCredentialOperationDispatcher to catalogCredentialManager. and rename PathWithCredentialType to PathBasedContext

* @param path The path string.
* @param credentialType The type of the credential.
*/
public PathBasedContext(String path, String credentialType) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PathBasedContext is quite similar to PathBasedCredentialContext, can we have a better name, like PathContext?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated


private static Set<String> getCredentialProvidersFromProperties(Map<String, String> properties) {
if (properties == null) {
return ImmutableSet.of();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use Collections.emptySet() for here and other places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@jerryshao
Copy link
Contributor

Generally LGTM, just two small issues.

* @param nameIdentifier, The identifier for fileset, table, etc.
* @return A list of {@link PathContext}
*/
List<PathContext> getPathBasedContext(NameIdentifier nameIdentifier);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please change to getPathContext?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@jerryshao jerryshao merged commit 082bbdc into apache:main Dec 26, 2024
26 checks passed
Abyss-lord pushed a commit to Abyss-lord/gravitino that referenced this pull request Dec 29, 2024
…atalog (apache#5682)

### What changes were proposed in this pull request?

Support credential vending for fileset catalog

1. add `credential-providers` properties for the fileset catalog,
schema, and fileset.
2. try to get `credential-providers` from the order of fileset, schema,
and catalog.
3. The user could set multi-credential providers


### Why are the changes needed?

Fix: apache#5620 

### Does this PR introduce _any_ user-facing change?

will add document after this PR is merged

### How was this patch tested?
Add IT and test with local setup Gravitino server
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Subtask] Support credential vending for fileset catalog
2 participants