Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#6054] feat(core): add more GCS permission to support fileset operations #6141

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

FANNG1
Copy link
Contributor

@FANNG1 FANNG1 commented Jan 8, 2025

What changes were proposed in this pull request?

  1. for resource path like a/b, add "a", "a/", "a/b" read permission for GCS connector
  2. replace storage.legacyBucketReader with storage.insightsCollectorService, because storage.legacyBucketReader provides extra list permission for the bucket.

Why are the changes needed?

Fix: #6054

Does this PR introduce any user-facing change?

no

How was this patch tested?

Iceberg GCS IT and fileset GCS credential IT

@FANNG1 FANNG1 marked this pull request as draft January 8, 2025 01:28
@FANNG1 FANNG1 changed the title [SIP] fix gcs error [#6054] fix gcs error Jan 8, 2025
@FANNG1 FANNG1 changed the title [#6054] fix gcs error [#6054] feat(core): add more permission to support fileset operations Jan 8, 2025
@FANNG1 FANNG1 marked this pull request as ready for review January 8, 2025 06:49
@FANNG1
Copy link
Contributor Author

FANNG1 commented Jan 8, 2025

@jerryshao @yuqi1129 , PTAL

String.format(
"resource.name.startsWith('projects/_/buckets/%s/objects/%s')",
bucketName, resourcePath));
getAllResources(resourcePath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we only need to handle readExpressions, what about writeExpressions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the behaviour of Hadoop GCS connector to check subpath info to check path exists. so no need to add write permission.


@VisibleForTesting
// "a/b/c" will get ["a", "a/", "a/b", "a/b/", "a/b/c"]
static List<String> getAllResources(String resourcePath) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the real cause of this issue? I recall that you used the prefix match mode to control permission. Why can't it work?

When I worked on GCS IAM authorization, I tested if using the prefix condition on a role, I worked, so I wonder if there is any difference in credentials.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a/b/c for example, without this, only grant the permission for the path start with a/b/c, but HADOOP GCS connector need the storage read perssion for a/ a/b/ too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave the user the list object privileges for the bucket and read permission for the exact path, then the user will got the privileges.

image

The following is the result:

Can't list objects in the bucket root
image

Can't list objects in the xiaoyu123/gcs_catalog_with_organization
image

Can't list objects in the xiaoyu123/gcs_catalog_with_organization/schema
image

Can list objects in the xiaoyu123/gcs_catalog_with_organization/schema/example
image

Can list objects in the xiaoyu123/gcs_catalog_with_organization/schema/example/people
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I couldn't get your point, what's the problem here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I 'm trying to say: Is this a proper way to solve this problem? If the path is very long, say have the item may be very large.

In the example above, I can use the prefix condition of the path to control the permission in IAM, so I wonder If we can do similar operation in for credentails.

@FANNG1 FANNG1 changed the title [#6054] feat(core): add more permission to support fileset operations [#6054] feat(core): add more GCS permission to support fileset operations Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement] permission is not enough to write GCS when using credential in fileset
2 participants