-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#5973] feat(hadoop-catalog): Support credential when using fileset catalog with cloud storage #5974
base: main
Are you sure you want to change the base?
Conversation
...ts/filesystem-hadoop3/src/test/java/org/apache/gravitino/filesystem/hadoop/TestGvfsBase.java
Outdated
Show resolved
Hide resolved
Do you plan to support the static credential like |
bundles/aliyun/src/main/java/org/apache/gravitino/oss/fs/OSSSessionCredentialProvider.java
Outdated
Show resolved
Hide resolved
...org/apache/gravitino/filesystem/hadoop/integration/test/GravitinoVirtualFileSystemGCSIT.java
Outdated
Show resolved
Hide resolved
The current PR also support static credential as s3 endpoint(can replace s3-region) is a required parameter |
bundles/aliyun/build.gradle.kts
Outdated
@@ -39,6 +39,10 @@ dependencies { | |||
implementation(project(":catalogs:hadoop-common")) { | |||
exclude("*") | |||
} | |||
implementation(project(":clients:client-java-runtime", configuration = "shadow")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, why this should be implementation
, AFAIK, the bundles are typically used with gvfs, we already included this client runtime in gvfs, what's the reason to do this here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me think a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed as suggested.
|
||
public OSSCredentialProvider(URI uri, Configuration conf) { | ||
this.filesetIdentifier = | ||
conf.get(GravitinoVirtualFileSystemConfiguration.GVFS_FILESET_IDENTIFIER); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you make sure the behavior is correct if you have multiple filesets with different FSs, have you verified?
If you have multiple filesets with the same FS, will they share the same CredentialProvider
or use the different ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to verify the behavior there're multiple fileset for same CredentialProvider
, because the CredentialProvider
seems are shared by all filesets.
FilesetCatalog filesetCatalog = client.loadCatalog(catalog).asFilesetCatalog(); | ||
|
||
Fileset fileset = filesetCatalog.loadFileset(NameIdentifier.of(idents[2], idents[3])); | ||
Credential[] credentials = fileset.supportsCredentials().getCredentials(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually returns credentials
, why do you insist on XXXCredentialProvider
, not XXXCredentialsProvider
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will eventually only use one of them. I have no preference on XXXCredentialProvider
or XXXCredentialsProvider
, If it matters, I can change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will soon support one fileset mapping to multiple storage locations in the next version, which will return multiple credentials, please think of this scenarios carefully, and don't do the assumptions based on your own, and refactor them again in the next version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OKay, I took it.
new DefaultCredentials( | ||
configuration.get(Constants.ACCESS_KEY_ID), | ||
configuration.get(Constants.ACCESS_KEY_SECRET)); | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if user didn't configure the AKSK? Should they still configure the AKSK when credential vending is enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can't get any crendential from Gravitino server(dynamic or static), we will try using AKSK. If ASKS has not been set, errors will occur.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should they still configure the AKSK when credential vending is enabled?
I prefer to keep the way that only support AKSK in version 0.7.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need a flag to enable credential vending on the client side like Iceberg client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you added an option in the Icebert client to use credentials or not? I want to make credentials transparent to client, so I'm hesitant to add this option in the client side. @jerryshao do you have any comments on this point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The iceberg community added an option in the Icebert client to control whether enable credential vending.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The iceberg community added an option in the Icebert client to control whether enable credential vending.
GVFS supports multiple storage at the same time and aims to shield the difference. I'm afraid the case is quite different from that in Iceberg rest server.
.findFirst(); | ||
if (dynamicCredential.isPresent()) { | ||
return dynamicCredential; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use Optional
, please use monad style (functional programming), not using if...
, which is not elegant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, let me make some adjustments.
@@ -60,7 +61,15 @@ public FileSystem getFileSystem(Path path, Map<String, String> config) throws IO | |||
hadoopConfMap.put(OSS_FILESYSTEM_IMPL, AliyunOSSFileSystem.class.getCanonicalName()); | |||
} | |||
|
|||
if (!hadoopConfMap.containsKey(Constants.CREDENTIALS_PROVIDER_KEY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use a function, it's hard to understand
|
||
@Override | ||
public Configuration getConf() { | ||
return this.configuration; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove this.
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
bundles/aliyun/build.gradle.kts
Outdated
@@ -29,6 +29,7 @@ dependencies { | |||
compileOnly(project(":catalogs:catalog-common")) | |||
compileOnly(project(":catalogs:catalog-hadoop")) | |||
compileOnly(project(":core")) | |||
compileOnly(project(":clients:client-java-runtime", configuration = "shadow")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you use the runtime jar for compileOnly
dependency?
if (accessTokenProvider.getAccessToken() != null) { | ||
configuration.set(GCS_TOKEN_PROVIDER_IMPL, GCSCredentialsProvider.class.getName()); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should decide whether we should let users to set AKSK or credential file if credential vending is not enabled. IMO, I think we should not enable XXXCredentialsProvider
is credential vending is not enabled in the server side; if it is enabled, then we should fully rely on credential vending, no need to configure the user-side credentials again.
WDYT? @FANNG1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, this is more clear, if credential vending is enabled in the server side, use server credentials; if not, use credentials in client side.
@@ -23,6 +23,7 @@ plugins { | |||
|
|||
// try to avoid adding extra dependencies because it is used by catalogs and connectors. | |||
dependencies { | |||
implementation(project(":api")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why hadoop-common
depends on api
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gvfs credential provider inerface located in this module and it needs interface credential which in module api
What changes were proposed in this pull request?
Support dynamic credential in obtaining cloud storage fileset.
Why are the changes needed?
Static key are not very safe, we need to optimize it.
Fix: #5973
Does this PR introduce any user-facing change?
N/A
How was this patch tested?
N/A