-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#5620] feat(fileset): Support credential vending for fileset catalog #5682
Conversation
bacb98c
to
cf38676
Compare
92a3167
to
ba917f4
Compare
### What changes were proposed in this pull request? This is the first PR of credential vending for Gravitino server to add credential API , please refer to #5682 for the overall workflow. Examples to use the API: ```java Catalog catalog = metalake.loadCatalog(catalogName); Credential[] credentials = catalog.supportsCredentials().getCredentials(); Credential credential = catalog.supportsCredentials().getCredential(“s3-token”); ``` ```java Fileset fileset = catalog.asFilesetCatalog().loadFileset(filesetIdent); Credential[] credentials = fileset.supportsCredentials().getCredentials(); Credential credential = fileset.supportsCredentials().getCredential(“s3-token”); ``` ### Why are the changes needed? Fix: #5619 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? using draft PR to do E2E test
### What changes were proposed in this pull request? This is the first PR of credential vending for Gravitino server to add credential API , please refer to #5682 for the overall workflow. Examples to use the API: ```java Catalog catalog = metalake.loadCatalog(catalogName); Credential[] credentials = catalog.supportsCredentials().getCredentials(); Credential credential = catalog.supportsCredentials().getCredential(“s3-token”); ``` ```java Fileset fileset = catalog.asFilesetCatalog().loadFileset(filesetIdent); Credential[] credentials = fileset.supportsCredentials().getCredentials(); Credential credential = fileset.supportsCredentials().getCredential(“s3-token”); ``` ### Why are the changes needed? Fix: #5619 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? using draft PR to do E2E test
6d361c5
to
995c130
Compare
4594934
to
c046392
Compare
@jerryshao @yuqi1129 this PR is ready now, please help to review, thanks |
|
||
public static final Logger LOG = LoggerFactory.getLogger(SecureHadoopCatalogOperations.class); | ||
|
||
private final HadoopCatalogOperations hadoopCatalogOperations; | ||
private CatalogCredentialManager catalogCredentialManager; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we achieve this credential operations in the core module, like in operation dispatcher? It's a bit tricky to mix this with the catalog, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean like tag operation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, my feeling is that the credential may be orthogonal to the catalog implementation, we don't have to embed the code to each catalog implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me think more, what's different for Credential operations is the classloader, it should load the credential provider in catalog classloader.
return getCatalogCredentialContexts(); | ||
} | ||
return getFilesetCredentialContexts(nameIdentifier, privilege); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we support credential in the schema level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The java client only supports Catalog
and Fileset
, but there is no extra check for the metadata object type in server side, I'll add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean why don't we support credential in the schema level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The catalog credentials is mainly used for static credentials like AKSK, and the fileset credential is used for dymatic credentials like S3 tokens, is there any scenarios about schema credentials?
private Map<String, CredentialContext> getFilesetCredentialContexts( | ||
NameIdentifier filesetIdentifier, CredentialPrivilege privilege) { | ||
Fileset fileset = loadFileset(filesetIdentifier); | ||
String location = fileset.storageLocation(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to check if the location and the provider is mapped? For example, what if the storage location is s3 and we don't have s3 credential provider?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to do the mapping especially for custom credential provider, how about let it throw a runtime exception?
@SuppressWarnings("UnusedVariable") | ||
private CredentialPrivilege getCredentialPrivilege(String user, NameIdentifier identifier) | ||
throws NotAuthorizedException { | ||
return CredentialPrivilege.WRITE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we return write privilege here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides, do we need a CredentialManager
, or is the class name proper? Basically, my feeling is that it manages nothing, just a dispatcher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Gravitino doesn't provide the feature to check which privileges are granted to a user to a specific fileset, we provide a default WRITE privilege to make the access to fileset works well.
/** | ||
* Manages credentials in one catalog, including different credential providers, credential caching. | ||
*/ | ||
public class CatalogCredentialManager { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is confusing we have a CredentialManager
and CatalogCredentialManager
, I would suggest we have a better name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'd better figure out a good name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
return getCatalogCredentialContexts(); | ||
} | ||
return getFilesetCredentialContexts(nameIdentifier, privilege); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean why don't we support credential in the schema level?
.filter(Objects::nonNull) | ||
.collect(Collectors.toList()); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this interface should be leveraged by catalog implementor, we should put this into the connector
package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
/** | ||
* Manages credentials in one catalog, including different credential providers, credential caching. | ||
*/ | ||
public class CatalogCredentialManager { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'd better figure out a good name.
* | ||
* @return An instance of {@link CatalogCredentialManager}. | ||
*/ | ||
CatalogCredentialManager getCatalogCredentialManager(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This interface design is not so good, why do we need to expose this class CatalogCredentialManager
to the catalog level? I assume an API to get credential or credential context is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed CatalogCredentialManager
and add getCredentialProviders
interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored
6c2d66f
to
94fd8c0
Compare
@jerryshao , comments are addressed, please review again |
* | ||
* @return A map with of the credential provider type {@link CredentialProvider}. | ||
*/ | ||
Map<String, CredentialProvider> getCatalogCredentialProviders(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the class CredentialContext
and CredentialProvider
should also move to connector
package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved CredentialContext
to connector
package, not movingCredentialProvider
for it is not exported to catalog developer in the latest implementation.
.filter(Objects::nonNull) | ||
.collect(Collectors.toList()); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This version looks better. I need to take a deep look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the limitation of the interface, some common logic(like loading credential providers) is implemented in CredentialUtils
, another way is to introduce a new class BaseCatalogOperations
to enclosure the common logic, but I'm not sure whether it's a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we achieve this in the BaseCatalog
, to mix this interface into BaseCatalog
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, seems a better solution, we could use catalog.supportsCredentials().getCredential(MetadataObject)
to get the credential and the specific catalog only provide the getMetadataObjectProperties()
getLocations()
interface, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using PathBasedCredentialContextInfo
to include both location and properites info to avoid load fileset multi times.
bfb55b4
to
48ca294
Compare
39504b2
to
4cc944e
Compare
@@ -78,6 +97,8 @@ public abstract class BaseCatalog<T extends BaseCatalog> | |||
|
|||
private volatile Map<String, String> properties; | |||
|
|||
private volatile CatalogCredentialManager catalogCredentialManager; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think we should use a different name, the name of CatalogCredentialManager
is too close to CredentialManager
, and easily gets misled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about CatalogCredentialProvider
or CatalogCredentialDispatcher
?
public enum CredentialPrivilege { | ||
READ, | ||
WRITE, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add javadoc for class, and also add @DeveloperApi
tag to all the classes in this package.
* @param schemaIdentifier The schema identifier. | ||
* @return The schema properties. | ||
*/ | ||
Map<String, String> schemaProperties(NameIdentifier schemaIdentifier); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need schema properties?
import org.slf4j.LoggerFactory; | ||
|
||
/** Manages credentials in one catalog. */ | ||
public class CatalogCredentialManager implements Closeable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not proper to expose this class to the catalog implementors, catalog implementors don't need know the details of this class. If you want to expose something to let users to leverage, it is better to expose interface, not implementation.
* This interface defines how to get credentials, the catalog operation must implement this | ||
* interface to support credential vending. | ||
*/ | ||
public interface SupportsCredentialOperations { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need two SupportsXXX
interfaces, one for BaseCatalog
, one for CatalogOperations
? It's hard to understand, also confuses the users.
98af70c
to
84a23a8
Compare
public List<Credential> getCredentials( | ||
NameIdentifier nameIdentifier, CredentialPrivilege privilege) { | ||
Map<String, CredentialContext> contexts = getCredentialContexts(nameIdentifier, privilege); | ||
return contexts.entrySet().stream() | ||
.map(entry -> getCatalogCredentialManager().getCredential(entry.getKey(), entry.getValue())) | ||
.filter(Objects::nonNull) | ||
.collect(Collectors.toList()); | ||
} | ||
|
||
private CatalogCredentialProvider getCatalogCredentialManager() { | ||
if (catalogCredentialProvider == null) { | ||
synchronized (this) { | ||
if (catalogCredentialProvider == null) { | ||
this.catalogCredentialProvider = new CatalogCredentialProvider(name(), properties()); | ||
} | ||
} | ||
} | ||
return catalogCredentialProvider; | ||
} | ||
|
||
private Map<String, CredentialContext> getCredentialContexts( | ||
NameIdentifier nameIdentifier, CredentialPrivilege privilege) { | ||
if (nameIdentifier.equals(NameIdentifierUtil.getCatalogIdentifier(nameIdentifier))) { | ||
return getCatalogCredentialContexts(); | ||
} | ||
|
||
if (ops() instanceof SupportsPathBasedCredentials) { | ||
List<PathWithCredentialType> pathWithCredentialTypes = | ||
((SupportsPathBasedCredentials) ops()).getPathWithCredentialTypes(nameIdentifier); | ||
return CredentialUtils.getPathBasedCredentialContexts(privilege, pathWithCredentialTypes); | ||
} | ||
throw new NotSupportedException( | ||
String.format("Catalog %s doesn't support generate credentials", name())); | ||
} | ||
|
||
private Map<String, CredentialContext> getCatalogCredentialContexts() { | ||
CatalogCredentialContext context = | ||
new CatalogCredentialContext(PrincipalUtils.getCurrentUserName()); | ||
Set<String> providers = CredentialUtils.getCredentialProviders(() -> properties()); | ||
return providers.stream().collect(Collectors.toMap(provider -> provider, provider -> context)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this logic to CredentialManager
? So from my understanding, each catalog can provide the list of PathWithCredentialType
to the CredentialManager
, credential manager can get credentials base on the provided list. We don't have to 1) introduce so manage concepts in the catalog level; 2) The credential management mechanism can all be achieved in one place. In the meantime, we don't have to introduce a concept of CatalogCredentialProvider
to manage the credentials per catalog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CredentialOperationDispatcher
is used to use catalog specific resource to generate credentials.
CatalogCredentialOperationDispatcher
manages the life of catalog credential provider and dispatch request to corresponding credential provider.
WDYT of the naming?
* The {@code PathWithCredentialType} class represents a combination of a path and its associated | ||
* credential type. | ||
*/ | ||
@Evolving |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change to @DeveloperApi
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated, What's the difference of DeveloperApi
and Evolving
, the other classes in connector
package use Evolving
* <p>If there are multiple properties suppliers, will try to get the credential providers in the | ||
* input order. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does user guarantee the input order if he doesn't know the background, can we have a better solution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about rename to getCredentialProvidersByOrder
or add order information to the arguments? like getCredentialProviders(firstPropertiesSupplier, secondPropertiesSupplier, thirdPropertiesSupplier)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using getCredentialProvidersByOrder
674f0d4
to
785664c
Compare
@jerryshao comments addressed, please help to review again |
rename |
* @param path The path string. | ||
* @param credentialType The type of the credential. | ||
*/ | ||
public PathBasedContext(String path, String credentialType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PathBasedContext
is quite similar to PathBasedCredentialContext
, can we have a better name, like PathContext
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
|
||
private static Set<String> getCredentialProvidersFromProperties(Map<String, String> properties) { | ||
if (properties == null) { | ||
return ImmutableSet.of(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use Collections.emptySet()
for here and other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
Generally LGTM, just two small issues. |
* @param nameIdentifier, The identifier for fileset, table, etc. | ||
* @return A list of {@link PathContext} | ||
*/ | ||
List<PathContext> getPathBasedContext(NameIdentifier nameIdentifier); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please change to getPathContext
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
…atalog (apache#5682) ### What changes were proposed in this pull request? Support credential vending for fileset catalog 1. add `credential-providers` properties for the fileset catalog, schema, and fileset. 2. try to get `credential-providers` from the order of fileset, schema, and catalog. 3. The user could set multi-credential providers ### Why are the changes needed? Fix: apache#5620 ### Does this PR introduce _any_ user-facing change? will add document after this PR is merged ### How was this patch tested? Add IT and test with local setup Gravitino server
What changes were proposed in this pull request?
Support credential vending for fileset catalog
credential-providers
properties for the fileset catalog, schema, and fileset.credential-providers
from the order of fileset, schema, and catalog.Why are the changes needed?
Fix: #5620
Does this PR introduce any user-facing change?
will add document after this PR is merged
How was this patch tested?
Add IT and test with local setup Gravitino server