Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#5973] feat(hadoop-catalog): Support credential when using fileset catalog with cloud storage #5974

Open
wants to merge 60 commits into
base: main
Choose a base branch
from

Conversation

yuqi1129
Copy link
Contributor

What changes were proposed in this pull request?

Support dynamic credential in obtaining cloud storage fileset.

Why are the changes needed?

Static key are not very safe, we need to optimize it.

Fix: #5973

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

N/A

@yuqi1129 yuqi1129 marked this pull request as draft December 24, 2024 13:13
@yuqi1129
Copy link
Contributor Author

This PR depends on #5620, #5806 and #5971

@yuqi1129 yuqi1129 marked this pull request as ready for review December 27, 2024 11:51
@yuqi1129
Copy link
Contributor Author

@FANNG1 ,
Please help to solve the problem about token permission for OSS and S3, besides, GCS token for Java client seems to have some problems, please see: #6028

@yuqi1129 yuqi1129 self-assigned this Dec 27, 2024
@yuqi1129 yuqi1129 requested a review from FANNG1 December 28, 2024 01:19
@FANNG1
Copy link
Contributor

FANNG1 commented Dec 29, 2024

Do you plan to support the static credential like S3SecretKey and some storage properties not included in the credential like s3-region in the new PR?

@yuqi1129
Copy link
Contributor Author

@FANNG1 , Please help to solve the problem about token permission for OSS and S3, besides, GCS token for Java client seems to have some problems, please see: #6028

solved

@yuqi1129
Copy link
Contributor Author

yuqi1129 commented Dec 30, 2024

Do you plan to support the static credential like S3SecretKey and some storage properties not included in the credential like s3-region in the new PR?

The current PR also support static credential as s3 endpoint(can replace s3-region) is a required parameter

@@ -39,6 +39,10 @@ dependencies {
implementation(project(":catalogs:hadoop-common")) {
exclude("*")
}
implementation(project(":clients:client-java-runtime", configuration = "shadow"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, why this should be implementation, AFAIK, the bundles are typically used with gvfs, we already included this client runtime in gvfs, what's the reason to do this here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me think a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as suggested.


public OSSCredentialProvider(URI uri, Configuration conf) {
this.filesetIdentifier =
conf.get(GravitinoVirtualFileSystemConfiguration.GVFS_FILESET_IDENTIFIER);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you make sure the behavior is correct if you have multiple filesets with different FSs, have you verified?

If you have multiple filesets with the same FS, will they share the same CredentialProvider or use the different ones?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this PR, each fileset will have a FileSystem instance.
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to verify the behavior there're multiple fileset for same CredentialProvider, because the CredentialProvider seems are shared by all filesets.

FilesetCatalog filesetCatalog = client.loadCatalog(catalog).asFilesetCatalog();

Fileset fileset = filesetCatalog.loadFileset(NameIdentifier.of(idents[2], idents[3]));
Credential[] credentials = fileset.supportsCredentials().getCredentials();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually returns credentials, why do you insist on XXXCredentialProvider, not XXXCredentialsProvider?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will eventually only use one of them. I have no preference on XXXCredentialProvider or XXXCredentialsProvider, If it matters, I can change it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will soon support one fileset mapping to multiple storage locations in the next version, which will return multiple credentials, please think of this scenarios carefully, and don't do the assumptions based on your own, and refactor them again in the next version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OKay, I took it.

new DefaultCredentials(
configuration.get(Constants.ACCESS_KEY_ID),
configuration.get(Constants.ACCESS_KEY_SECRET));
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if user didn't configure the AKSK? Should they still configure the AKSK when credential vending is enabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can't get any crendential from Gravitino server(dynamic or static), we will try using AKSK. If ASKS has not been set, errors will occur.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should they still configure the AKSK when credential vending is enabled?

I prefer to keep the way that only support AKSK in version 0.7.0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need a flag to enable credential vending on the client side like Iceberg client?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you added an option in the Icebert client to use credentials or not? I want to make credentials transparent to client, so I'm hesitant to add this option in the client side. @jerryshao do you have any comments on this point?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The iceberg community added an option in the Icebert client to control whether enable credential vending.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The iceberg community added an option in the Icebert client to control whether enable credential vending.

GVFS supports multiple storage at the same time and aims to shield the difference. I'm afraid the case is quite different from that in Iceberg rest server.

.findFirst();
if (dynamicCredential.isPresent()) {
return dynamicCredential;
}
Copy link
Contributor

@jerryshao jerryshao Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use Optional, please use monad style (functional programming), not using if..., which is not elegant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let me make some adjustments.

@@ -60,7 +61,15 @@ public FileSystem getFileSystem(Path path, Map<String, String> config) throws IO
hadoopConfMap.put(OSS_FILESYSTEM_IMPL, AliyunOSSFileSystem.class.getCanonicalName());
}

if (!hadoopConfMap.containsKey(Constants.CREDENTIALS_PROVIDER_KEY)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use a function, it's hard to understand


@Override
public Configuration getConf() {
return this.configuration;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -29,6 +29,7 @@ dependencies {
compileOnly(project(":catalogs:catalog-common"))
compileOnly(project(":catalogs:catalog-hadoop"))
compileOnly(project(":core"))
compileOnly(project(":clients:client-java-runtime", configuration = "shadow"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you use the runtime jar for compileOnly dependency?

if (accessTokenProvider.getAccessToken() != null) {
configuration.set(GCS_TOKEN_PROVIDER_IMPL, GCSCredentialsProvider.class.getName());
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should decide whether we should let users to set AKSK or credential file if credential vending is not enabled. IMO, I think we should not enable XXXCredentialsProvider is credential vending is not enabled in the server side; if it is enabled, then we should fully rely on credential vending, no need to configure the user-side credentials again.

WDYT? @FANNG1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, this is more clear, if credential vending is enabled in the server side, use server credentials; if not, use credentials in client side.

@@ -23,6 +23,7 @@ plugins {

// try to avoid adding extra dependencies because it is used by catalogs and connectors.
dependencies {
implementation(project(":api"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why hadoop-common depends on api?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gvfs credential provider inerface located in this module and it needs interface credential which in module api

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Support using credential when using fileset with cloud storage in Java GVFS
3 participants