diff --git a/docs/howto/features/cloud-access.md b/docs/howto/features/cloud-access.md index 5dd9ef9d74..7d3c07114d 100644 --- a/docs/howto/features/cloud-access.md +++ b/docs/howto/features/cloud-access.md @@ -36,18 +36,40 @@ This AWS IAM Role is managed via terraform. ## Enabling specific cloud access permissions 1. In the `.tfvars` file for the project in which this hub is based off - create (or modify) the `hub_cloud_permissions` variable. The config is - like: + create (or modify) the `hub_cloud_permissions` variable. + ```{warning} + `allow_access_to_external_requester_pays_buckets` is not yet supported on AWS! ``` + + The config is like: + + `````{tab-set} + ````{tab-item} GCP + :sync: gcp-key + ```yaml + hub_cloud_permissions = { + "": { + allow_access_to_external_requester_pays_buckets : true, + bucket_admin_access : ["bucket-1", "bucket-2"] + hub_namespace : "" + } + } + ``` + ```` + + ````{tab-item} AWS + :sync: aws-key + ```bash hub_cloud_permissions = { "": { - requestor_pays : true, bucket_admin_access : ["bucket-1", "bucket-2"] hub_namespace : "" } } ``` + ```` + ````` where: @@ -55,9 +77,9 @@ This AWS IAM Role is managed via terraform. and the cluster name together can't be more than 29 characters. `terraform` will complain if you go over this limit, so in general just use the name of the hub and shorten it only if `terraform` complains. - 2. (GCP only) `requestor_pays` enables permissions for user pods and dask worker - pods to identify as the project while making requests to Google Cloud Storage - buckets marked as 'requestor pays'. More details [here](topic:features:cloud:gcp:requestor-pays). + 2. (GCP only) `allow_access_to_external_requester_pays_buckets` enables permissions for user pods and dask worker + pods to identify as the project while making requests to other Google Cloud Storage + buckets, outside of this project, that have 'Requester Pays' enabled. More details [here](topic:features:cloud:gcp:requester-pays). 3. `bucket_admin_access` lists bucket names (as specified in `user_buckets` terraform variable) all users on this hub should have full read/write access to. Used along with the [user_buckets](howto:features:storage-buckets) diff --git a/docs/topic/features.md b/docs/topic/features.md index 517f7fee16..5206241489 100644 --- a/docs/topic/features.md +++ b/docs/topic/features.md @@ -23,8 +23,8 @@ improving the security posture of our hubs. ### GCP -(topic:features:cloud:gcp:requestor-pays)= -#### 'Requestor Pays' access to Google Cloud Storage buckets +(topic:features:cloud:gcp:requester-pays)= +#### 'Requester Pays' access By default, the organization *hosting* data on Google Cloud pays for both storage and bandwidth costs of the data. However, Google Cloud also offers @@ -33,9 +33,29 @@ option, where the bandwidth costs are paid for by the organization *requesting* the data. This is very commonly used by organizations that provide big datasets on Google Cloud storage, to sustainably share costs of maintaining the data. +**Requester Pays** is a feature that a bucket can have. + +#### Allow access to external `Requester Payes` buckets + +If buckets outside the project have the `Requester Payes` flag, then we need to: +- set `hub_cloud_permissions.allow_access_to_external_requester_pays_buckets` + in the terraform config of the cluster (see the guide at [](howto:features:cloud-access:access-perms)) +- this will allow them to be charged on their project for access of such + outside buckets + +```{warning} When this feature is enabled, users on a hub accessing cloud buckets from -other organizations marked as 'requestor pays' will increase our cloud bill. +other organizations marked as `Requester Pays` will increase our cloud bill. Hence, this is an opt-in feature. +``` + +#### Enable `Requester Pays` flag on community buckets + +The buckets that we set for communities, inside their projects can also have this flag enabled on them, which means that other people outside will be charged for their usage. + +```{warning} +This is not supported yet by our terraform. Follow https://github.com/2i2c-org/infrastructure/issues/3746 to check when support will be added. +``` (topic:features:cloud:scratch-buckets)= ## 'Scratch' buckets on object storage diff --git a/terraform/aws/projects/2i2c-aws-us.tfvars b/terraform/aws/projects/2i2c-aws-us.tfvars index cf56c5e671..a9a3a0cf3d 100644 --- a/terraform/aws/projects/2i2c-aws-us.tfvars +++ b/terraform/aws/projects/2i2c-aws-us.tfvars @@ -31,17 +31,14 @@ user_buckets = { hub_cloud_permissions = { "staging" : { - requestor_pays : true, bucket_admin_access : ["scratch-staging"], extra_iam_policy : "" }, "dask-staging" : { - requestor_pays : true, bucket_admin_access : ["scratch-dask-staging"], extra_iam_policy : "" }, "showcase" : { - requestor_pays : true, bucket_admin_access : [ "scratch-researchdelight", "persistent-showcase" @@ -49,17 +46,14 @@ hub_cloud_permissions = { extra_iam_policy : "" }, "ncar-cisl" : { - requestor_pays : true, bucket_admin_access : ["scratch-ncar-cisl"], extra_iam_policy : "" }, "go-bgc" : { - requestor_pays : true, bucket_admin_access : ["scratch-go-bgc"], extra_iam_policy : "" }, "itcoocean" : { - requestor_pays : true, bucket_admin_access : ["scratch-itcoocean"], extra_iam_policy : "" }, diff --git a/terraform/aws/projects/catalystproject-africa.tfvars b/terraform/aws/projects/catalystproject-africa.tfvars index 70efd99f76..728f18a381 100644 --- a/terraform/aws/projects/catalystproject-africa.tfvars +++ b/terraform/aws/projects/catalystproject-africa.tfvars @@ -16,12 +16,10 @@ user_buckets = { hub_cloud_permissions = { "staging" : { - requestor_pays : true, bucket_admin_access : ["scratch-staging"], extra_iam_policy : "" }, "prod" : { - requestor_pays : true, bucket_admin_access : ["scratch"], extra_iam_policy : "" }, diff --git a/terraform/aws/projects/earthscope.tfvars b/terraform/aws/projects/earthscope.tfvars index 57aeb6fbf9..688977269b 100644 --- a/terraform/aws/projects/earthscope.tfvars +++ b/terraform/aws/projects/earthscope.tfvars @@ -16,12 +16,10 @@ user_buckets = { hub_cloud_permissions = { "staging" : { - requestor_pays : true, bucket_admin_access : ["scratch-staging"], extra_iam_policy : "" }, "prod" : { - requestor_pays : true, bucket_admin_access : ["scratch"], extra_iam_policy : "" }, diff --git a/terraform/aws/projects/gridsst.tfvars b/terraform/aws/projects/gridsst.tfvars index e13b2f1a05..74680c5fcd 100644 --- a/terraform/aws/projects/gridsst.tfvars +++ b/terraform/aws/projects/gridsst.tfvars @@ -16,12 +16,10 @@ user_buckets = { hub_cloud_permissions = { "staging" : { - requestor_pays : true, bucket_admin_access : ["scratch-staging"], extra_iam_policy : "" }, "prod" : { - requestor_pays : true, bucket_admin_access : ["scratch"], extra_iam_policy : "" }, diff --git a/terraform/aws/projects/jupyter-meets-the-earth.tfvars b/terraform/aws/projects/jupyter-meets-the-earth.tfvars index 90615a14a9..73a5a38797 100644 --- a/terraform/aws/projects/jupyter-meets-the-earth.tfvars +++ b/terraform/aws/projects/jupyter-meets-the-earth.tfvars @@ -16,7 +16,6 @@ user_buckets = { hub_cloud_permissions = { "staging" : { - requestor_pays : true, bucket_admin_access : ["scratch-staging"], # FIXME: Previously, users were granted full S3 permissions. # Keep it the same for now @@ -34,7 +33,6 @@ hub_cloud_permissions = { EOT }, "prod" : { - requestor_pays : true, bucket_admin_access : ["scratch"], # FIXME: Previously, users were granted full S3 permissions. # Keep it the same for now diff --git a/terraform/aws/projects/nasa-cryo.tfvars b/terraform/aws/projects/nasa-cryo.tfvars index 1f45519983..72197c009d 100644 --- a/terraform/aws/projects/nasa-cryo.tfvars +++ b/terraform/aws/projects/nasa-cryo.tfvars @@ -22,7 +22,6 @@ user_buckets = { hub_cloud_permissions = { "staging" : { - requestor_pays : true, bucket_admin_access : ["scratch-staging", "persistent-staging"], # Provides readonly requestor-pays access to usgs-landsat bucket # FIXME: We should find a way to allow access to *all* requester pays @@ -57,7 +56,6 @@ hub_cloud_permissions = { EOT }, "prod" : { - requestor_pays : true, bucket_admin_access : ["scratch", "persistent"], # Provides readonly requestor-pays access to usgs-landsat bucket # FIXME: We should find a way to allow access to *all* requester pays diff --git a/terraform/aws/projects/nasa-esdis.tfvars b/terraform/aws/projects/nasa-esdis.tfvars index 186632f934..d97271f449 100644 --- a/terraform/aws/projects/nasa-esdis.tfvars +++ b/terraform/aws/projects/nasa-esdis.tfvars @@ -16,12 +16,10 @@ user_buckets = { hub_cloud_permissions = { "staging" : { - requestor_pays : true, bucket_admin_access : ["scratch-staging"], extra_iam_policy : "" }, "prod" : { - requestor_pays : true, bucket_admin_access : ["scratch"], extra_iam_policy : "" }, diff --git a/terraform/aws/projects/nasa-ghg.tfvars b/terraform/aws/projects/nasa-ghg.tfvars index bc09d26c26..831205b98e 100644 --- a/terraform/aws/projects/nasa-ghg.tfvars +++ b/terraform/aws/projects/nasa-ghg.tfvars @@ -16,7 +16,6 @@ user_buckets = { hub_cloud_permissions = { "staging" : { - requestor_pays : true, bucket_admin_access : ["scratch-staging"], extra_iam_policy : <<-EOT { @@ -70,7 +69,6 @@ hub_cloud_permissions = { EOT }, "prod" : { - requestor_pays : true, bucket_admin_access : ["scratch"], extra_iam_policy : <<-EOT { diff --git a/terraform/aws/projects/nasa-veda.tfvars b/terraform/aws/projects/nasa-veda.tfvars index e834ce1829..e74f77cfbd 100644 --- a/terraform/aws/projects/nasa-veda.tfvars +++ b/terraform/aws/projects/nasa-veda.tfvars @@ -16,7 +16,6 @@ user_buckets = { hub_cloud_permissions = { "staging" : { - requestor_pays : true, bucket_admin_access : ["scratch-staging"], extra_iam_policy : <<-EOT { @@ -75,7 +74,6 @@ hub_cloud_permissions = { EOT }, "prod" : { - requestor_pays : true, bucket_admin_access : ["scratch"], extra_iam_policy : <<-EOT { diff --git a/terraform/aws/projects/openscapes.tfvars b/terraform/aws/projects/openscapes.tfvars index 80a1e287b2..77d86e6ee1 100644 --- a/terraform/aws/projects/openscapes.tfvars +++ b/terraform/aws/projects/openscapes.tfvars @@ -19,12 +19,10 @@ user_buckets = { hub_cloud_permissions = { "staging" : { - requestor_pays : true, bucket_admin_access : ["scratch-staging"], extra_iam_policy : "" }, "prod" : { - requestor_pays : true, bucket_admin_access : ["scratch"], extra_iam_policy : "" }, diff --git a/terraform/aws/projects/smithsonian.tfvars b/terraform/aws/projects/smithsonian.tfvars index 65acdb6510..1ec655e8e7 100644 --- a/terraform/aws/projects/smithsonian.tfvars +++ b/terraform/aws/projects/smithsonian.tfvars @@ -13,12 +13,10 @@ user_buckets = { hub_cloud_permissions = { "staging" : { - requestor_pays : true, bucket_admin_access : ["scratch-staging"], extra_iam_policy : "" }, "prod" : { - requestor_pays : true, bucket_admin_access : ["scratch"], extra_iam_policy : "" }, diff --git a/terraform/aws/projects/template.tfvars b/terraform/aws/projects/template.tfvars index bb0ff4344f..20f703b97e 100644 --- a/terraform/aws/projects/template.tfvars +++ b/terraform/aws/projects/template.tfvars @@ -16,12 +16,10 @@ user_buckets = { hub_cloud_permissions = { "staging" : { - requestor_pays : true, bucket_admin_access : ["scratch-staging"], extra_iam_policy : "" }, "prod" : { - requestor_pays : true, bucket_admin_access : ["scratch"], extra_iam_policy : "" }, diff --git a/terraform/aws/projects/ubc-eoas.tfvars b/terraform/aws/projects/ubc-eoas.tfvars index c3cba2162d..f38abdf057 100644 --- a/terraform/aws/projects/ubc-eoas.tfvars +++ b/terraform/aws/projects/ubc-eoas.tfvars @@ -16,12 +16,10 @@ user_buckets = { hub_cloud_permissions = { "staging" : { - requestor_pays : true, bucket_admin_access : ["scratch-staging"], extra_iam_policy : "" }, "prod" : { - requestor_pays : true, bucket_admin_access : ["scratch"], extra_iam_policy : "" }, diff --git a/terraform/aws/projects/victor.tfvars b/terraform/aws/projects/victor.tfvars index ec4b6dcffd..f6237fe892 100644 --- a/terraform/aws/projects/victor.tfvars +++ b/terraform/aws/projects/victor.tfvars @@ -16,12 +16,10 @@ user_buckets = { hub_cloud_permissions = { "staging" : { - requestor_pays : true, bucket_admin_access : ["scratch-staging"], extra_iam_policy : "" }, "prod" : { - requestor_pays : true, bucket_admin_access : ["scratch"], extra_iam_policy : "" }, diff --git a/terraform/aws/variables.tf b/terraform/aws/variables.tf index 281374d1b5..332a78ab5a 100644 --- a/terraform/aws/variables.tf +++ b/terraform/aws/variables.tf @@ -44,7 +44,10 @@ variable "user_buckets" { } variable "hub_cloud_permissions" { - type = map(object({ requestor_pays : bool, bucket_admin_access : set(string), extra_iam_policy : string })) + type = map(object({ + bucket_admin_access : set(string), + extra_iam_policy : string + })) default = {} description = <<-EOT Map of cloud permissions given to a particular hub @@ -52,12 +55,9 @@ variable "hub_cloud_permissions" { Key is name of the hub namespace in the cluster, and values are particular permissions users running on those hubs should have. Currently supported are: - 1. requestor_pays: Identify as coming from the google cloud project when accessing - storage buckets marked as https://cloud.google.com/storage/docs/requester-pays. - This *potentially* incurs cost for us, the originating project, so opt-in. - 2. bucket_admin_access: List of S3 storage buckets that users on this hub should have read + 1. bucket_admin_access: List of S3 storage buckets that users on this hub should have read and write permissions for. - 3. extra_iam_policy: An AWS IAM Policy document that grants additional rights to the users + 2. extra_iam_policy: An AWS IAM Policy document that grants additional rights to the users on this hub when talking to AWS services. EOT } diff --git a/terraform/gcp/projects/awi-ciroh.tfvars b/terraform/gcp/projects/awi-ciroh.tfvars index 5a6a6ebe94..1a4cd55562 100644 --- a/terraform/gcp/projects/awi-ciroh.tfvars +++ b/terraform/gcp/projects/awi-ciroh.tfvars @@ -63,12 +63,12 @@ dask_nodes = { hub_cloud_permissions = { "staging" : { - requestor_pays : false, + allow_access_to_external_requester_pays_buckets : false, bucket_admin_access : ["scratch-staging", "persistent-staging"], hub_namespace : "staging" }, "prod" : { - requestor_pays : false, + allow_access_to_external_requester_pays_buckets : false, bucket_admin_access : ["scratch", "persistent"], hub_namespace : "prod" } diff --git a/terraform/gcp/projects/daskhub-template.tfvars b/terraform/gcp/projects/daskhub-template.tfvars index 6aa89162db..54075ecc69 100644 --- a/terraform/gcp/projects/daskhub-template.tfvars +++ b/terraform/gcp/projects/daskhub-template.tfvars @@ -48,7 +48,7 @@ user_buckets = { hub_cloud_permissions = { "{{ hub_name }}" : { - requestor_pays : true, + allow_access_to_external_requester_pays_buckets : true, bucket_admin_access : ["scratch-{{ hub_name }}"], hub_namespace : "{{ hub_name }}" }, diff --git a/terraform/gcp/projects/leap.tfvars b/terraform/gcp/projects/leap.tfvars index f2862cdb2e..4fca26bb32 100644 --- a/terraform/gcp/projects/leap.tfvars +++ b/terraform/gcp/projects/leap.tfvars @@ -60,13 +60,13 @@ user_buckets = { hub_cloud_permissions = { "staging" : { - requestor_pays : true, + allow_access_to_external_requester_pays_buckets : true, bucket_admin_access : ["scratch-staging", "persistent-staging"], bucket_readonly_access : ["persistent-ro-staging"], hub_namespace : "staging" }, "prod" : { - requestor_pays : true, + allow_access_to_external_requester_pays_buckets : true, bucket_admin_access : ["scratch", "persistent"], bucket_readonly_access : ["persistent-ro"], hub_namespace : "prod" diff --git a/terraform/gcp/projects/linked-earth.tfvars b/terraform/gcp/projects/linked-earth.tfvars index 4234fb37a8..86678d6f2f 100644 --- a/terraform/gcp/projects/linked-earth.tfvars +++ b/terraform/gcp/projects/linked-earth.tfvars @@ -61,12 +61,12 @@ dask_nodes = { hub_cloud_permissions = { "staging" : { - requestor_pays : false, + allow_access_to_external_requester_pays_buckets : false, bucket_admin_access : ["scratch-staging"], hub_namespace : "staging" }, "prod" : { - requestor_pays : false, + allow_access_to_external_requester_pays_buckets : false, bucket_admin_access : ["scratch"], hub_namespace : "prod" } diff --git a/terraform/gcp/projects/meom-ige.tfvars b/terraform/gcp/projects/meom-ige.tfvars index 3c25ebda9a..f76778880f 100644 --- a/terraform/gcp/projects/meom-ige.tfvars +++ b/terraform/gcp/projects/meom-ige.tfvars @@ -81,12 +81,12 @@ user_buckets = { hub_cloud_permissions = { "staging" : { - requestor_pays : true, + allow_access_to_external_requester_pays_buckets : true, bucket_admin_access : ["scratch", "data"], hub_namespace : "staging" }, "prod" : { - requestor_pays : true, + allow_access_to_external_requester_pays_buckets : true, bucket_admin_access : ["scratch", "data"], hub_namespace : "prod" } diff --git a/terraform/gcp/projects/pangeo-hubs.tfvars b/terraform/gcp/projects/pangeo-hubs.tfvars index 9277761bbd..ddcd8bd49b 100644 --- a/terraform/gcp/projects/pangeo-hubs.tfvars +++ b/terraform/gcp/projects/pangeo-hubs.tfvars @@ -109,17 +109,17 @@ dask_nodes = { hub_cloud_permissions = { "staging" : { - requestor_pays : true, + allow_access_to_external_requester_pays_buckets : true, bucket_admin_access : ["scratch-staging"], hub_namespace : "staging" }, "prod" : { - requestor_pays : true, + allow_access_to_external_requester_pays_buckets : true, bucket_admin_access : ["scratch"], hub_namespace : "prod" }, "coessing" : { - requestor_pays : true, + allow_access_to_external_requester_pays_buckets : true, bucket_admin_access : ["coessing-scratch"], hub_namespace : "coessing" }, diff --git a/terraform/gcp/projects/pilot-hubs.tfvars b/terraform/gcp/projects/pilot-hubs.tfvars index 620d8119a0..02d3769aac 100644 --- a/terraform/gcp/projects/pilot-hubs.tfvars +++ b/terraform/gcp/projects/pilot-hubs.tfvars @@ -58,23 +58,23 @@ user_buckets = { hub_cloud_permissions = { "dask-staging" : { - requestor_pays : true, + allow_access_to_external_requester_pays_buckets : true, bucket_admin_access : [], hub_namespace : "dask-staging" }, "ohw" : { - requestor_pays : true, + allow_access_to_external_requester_pays_buckets : true, bucket_admin_access : [], hub_namespace : "ohw" }, # Can't use full name here as it violates line length restriction of service account id "catalyst-coop" : { - requestor_pays : true, + allow_access_to_external_requester_pays_buckets : true, bucket_admin_access : [], hub_namespace : "catalyst-cooperative" }, "jackeddy" : { - requestor_pays : true, + allow_access_to_external_requester_pays_buckets : true, bucket_admin_access : ["jackeddy-scratch"], hub_namespace : "jackeddy" }, diff --git a/terraform/gcp/projects/qcl.tfvars b/terraform/gcp/projects/qcl.tfvars index 1433331606..a39144325e 100644 --- a/terraform/gcp/projects/qcl.tfvars +++ b/terraform/gcp/projects/qcl.tfvars @@ -66,12 +66,12 @@ notebook_nodes = { hub_cloud_permissions = { "staging" : { - requestor_pays : false, + allow_access_to_external_requester_pays_buckets : false, bucket_admin_access : ["scratch-staging"], hub_namespace : "staging" }, "prod" : { - requestor_pays : false, + allow_access_to_external_requester_pays_buckets : false, bucket_admin_access : ["scratch"], hub_namespace : "prod" } diff --git a/terraform/gcp/variables.tf b/terraform/gcp/variables.tf index b131d50965..530665756a 100644 --- a/terraform/gcp/variables.tf +++ b/terraform/gcp/variables.tf @@ -402,7 +402,7 @@ variable "max_cpu" { variable "hub_cloud_permissions" { type = map( object({ - requestor_pays : bool, + allow_access_to_external_requester_pays_buckets : optional(bool, false), bucket_admin_access : set(string), bucket_readonly_access : optional(set(string), []), hub_namespace : string @@ -415,9 +415,11 @@ variable "hub_cloud_permissions" { Key is name of the hub namespace in the cluster, and values are particular permissions users running on those hubs should have. Currently supported are: - 1. requestor_pays: Identify as coming from the google cloud project when accessing - storage buckets marked as https://cloud.google.com/storage/docs/requester-pays. - This *potentially* incurs cost for us, the originating project, so opt-in. + 1. allow_access_to_external_requester_pays_buckets: Allow code running in user servers from this + hub to identify as coming from this particular GCP project when accessing GCS buckets in other projects + marked as 'Requester Pays'. In this case, the egress costs will + be borne by the project *containing the hub*, rather than the project *containing the bucket*. + Egress costs can get quite expensive, so this is 'opt-in'. 2. bucket_admin_access: List of GCS storage buckets that users on this hub should have read and write permissions for. EOT diff --git a/terraform/gcp/workload-identity.tf b/terraform/gcp/workload-identity.tf index 99e907c74e..72aca9a19d 100644 --- a/terraform/gcp/workload-identity.tf +++ b/terraform/gcp/workload-identity.tf @@ -47,7 +47,7 @@ resource "google_project_iam_custom_role" "requestor_pays" { } resource "google_project_iam_member" "requestor_pays_binding" { - for_each = toset([for hub_name, permissions in var.hub_cloud_permissions : hub_name if permissions.requestor_pays]) + for_each = toset([for hub_name, permissions in var.hub_cloud_permissions : hub_name if permissions.allow_access_to_external_requester_pays_buckets]) project = var.project_id role = google_project_iam_custom_role.requestor_pays.name member = "serviceAccount:${google_service_account.workload_sa[each.value].email}"