-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#5624] feat(bundles): support ADLS credential provider #5737
Conversation
Hi @FANNG1 , could you please take a look when you have a moment |
seems there are class conflictes with netty in Gravitino and netty in azure, My suggestion is to shadow the related packages in
|
Besides the credential information defined in xxTokenCredential, the other credential related information like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall lgtm.
a few nits for consideration.
api/src/main/java/org/apache/gravitino/credential/ADLSTokenCredential.java
Show resolved
Hide resolved
api/src/main/java/org/apache/gravitino/credential/ADLSTokenCredential.java
Outdated
Show resolved
Hide resolved
Hi @FANNG1 , |
...st-server/src/test/java/org/apache/gravitino/iceberg/integration/test/IcebergRESTADLSIT.java
Show resolved
Hide resolved
@orenccl I don't have enough time to test and review this week, may delay until next week, is that ok for you? |
Hi, @FANNG1 |
bundles/azure-bundle/src/main/java/org/apache/gravitino/abs/credential/ADLSTokenProvider.java
Outdated
Show resolved
Hide resolved
7e3e294
to
956df74
Compare
Originally, I was able to pass the unit tests. However, when I built the Iceberg REST server Docker image and tested it with Spark, I ran into an error. INSERT INTO testDatabase.test_azure VALUES (1), (2)
24/12/06 22:49:49 ERROR TorrentBroadcast: Store broadcast broadcast_0 fail, remove all pieces of the broadcast
24/12/06 22:49:49 ERROR SparkSQLDriver: Failed in [INSERT INTO testDatabase.test_azure VALUES (1), (2)]
java.io.NotSerializableException: com.azure.storage.common.StorageSharedKeyCredential
Serialization stack:
- object not serializable (class: com.azure.storage.common.StorageSharedKeyCredential, value: com.azure.storage.common.StorageSharedKeyCredential@63f38fb1) I found that this is a known issue. Please refer to: This issue has been addressed in Iceberg version 1.6.0. How should we handle this problem? |
0c1a497
to
1c79019
Compare
Using the account name and account key to obtain StorageSharedKeyCredential doesn't work in the Iceberg 1.5.2 + Spark environment; only SAS tokens or DefaultAzureCredentialBuilder can be used. However, the Gravitino server currently cannot use tokens to write metadata information, and DefaultAzureCredentialBuilder requires users to set environment variables, which cannot be configured through Gravitino conf. Additional Testing ObservationsAfter encountering this issue, I retested it on another PC that does not have the AWS CLI installed. The unit test failed during the Previously, the tests might have passed due to some interference from the AWS CLI. However, under normal conditions, the tests should have failed. For reference, please see: AWS Java SDK Developer Guide - Credentials. In simple terms: It seems the AWS SDK automatically reads the SAS token temporarily stored in the AWS CLI. |
common/src/main/java/org/apache/gravitino/credential/CredentialPropertyUtils.java
Outdated
Show resolved
Hide resolved
...berg-common/src/main/java/org/apache/gravitino/iceberg/common/ops/IcebergCatalogWrapper.java
Outdated
Show resolved
Hide resolved
The main problem of the current implementation is passing Azure secret key not the token(token name is not correct) to the client side, The |
After correct the token name for Iceberg, I still failed to run pass the IT for auth failure, seems the token is not generated correctly(I also failed to do the Azure operation with the generated token in the python client). could you dig out the reason? |
Updated a version that corrects the token key and passes the IT test. |
catalogs/catalog-common/src/main/java/org/apache/gravitino/storage/ABSProperties.java
Outdated
Show resolved
Hide resolved
c646d3f
to
3c2a088
Compare
core/src/main/java/org/apache/gravitino/credential/config/ADLSCredentialConfig.java
Show resolved
Hide resolved
LGTM, just one comment, @yuqi1129 do you have time to review? Some changes related to fileset azure properties. |
f1449de
to
98f9319
Compare
@orenccl , merged to main, it's an import feature for Gravitino, thanks for your work! There are some subsequent issues related like support |
@FANNG1 |
thanks, you could create the issue when you are free |
I think when we implement For supporting ADLS credentials in the Python client, should we refer to PR #5209 ? |
Update Iceberg version to 1.6.0 seems not required because the decrease problems exists in Iceberg REST client side (Spark or Flink) not Iceberg REST server side, but whether updating Iceberg or not both ok to me. For ADLS credentials in Python client, please refer to #5890 |
…nnector (#5952) ### What changes were proposed in this pull request? 1. Most code work is implemented in #5938 #5737 including catalog properties convert and add Iceberg azure bundle jar, this PR mainly about test and document. 2. Remove hidden properties of the cloud secret key from the Iceberg catalog, as Gravitino doesn't have an unified security management yet and Iceberg REST server need to fetch catalog cloud properties to initiate `IcebergWrapper` dymaticly. Another benefit is spark connector does not need to specify the secret key explictly. Supports ADLS for Iceberg catalog and spark connector ### Why are the changes needed? Fix: #5954 ### Does this PR introduce _any_ user-facing change? Yes, the user no need to specify the cloud secret key in spark connector. ### How was this patch tested? test in local enviroment
…ark connector (apache#5952) ### What changes were proposed in this pull request? 1. Most code work is implemented in apache#5938 apache#5737 including catalog properties convert and add Iceberg azure bundle jar, this PR mainly about test and document. 2. Remove hidden properties of the cloud secret key from the Iceberg catalog, as Gravitino doesn't have an unified security management yet and Iceberg REST server need to fetch catalog cloud properties to initiate `IcebergWrapper` dymaticly. Another benefit is spark connector does not need to specify the secret key explictly. Supports ADLS for Iceberg catalog and spark connector ### Why are the changes needed? Fix: apache#5954 ### Does this PR introduce _any_ user-facing change? Yes, the user no need to specify the cloud secret key in spark connector. ### How was this patch tested? test in local enviroment
What changes were proposed in this pull request?
Add ADLS credential provider
Why are the changes needed?
Fix: #5624
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added a unit test and verified successful access to the ADLS container.
Supplementary Information
Chose to use the following versions of Azure libraries:
azure-identity
= "1.13.1"azure-storage-file-datalake
= "12.20.0"azure-core-http-okhttp
= "1.12.0"Instead of the latest versions because, although the official documentation states support for Java 8 and later, the latest versions appear to have been compiled with Java 21. This caused compilation issues in the Gravitino environment. Therefore, downgraded and tested to find the latest usable versions.