Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create new role for GH Action access to S3 #4

Closed
joverlee521 opened this issue May 8, 2024 · 6 comments · Fixed by #8
Closed

Create new role for GH Action access to S3 #4

joverlee521 opened this issue May 8, 2024 · 6 comments · Fixed by #8
Assignees
Labels
enhancement New feature or request

Comments

@joverlee521
Copy link
Contributor

Context

See https://github.com/nextstrain/private/issues/96#issuecomment-2078009430

Description

  1. import existing IAM policies defining access to our AWS S3 buckets
    • AllowEditingOfNextstrainDataBucket
    • AllowEditingOfNextstrainDataPrivateBucket
    • AllowEditingOfNextstrainNcovPrivateBucket
    • AllowEditingOfNextstrainStagingBucket
  2. Create new role for GH Action access to S3 (GitHubActionsRoleNextstrainS3Access?)
    • with managed_policy_arns pointing to imported policies.
    • with "token.actions.githubusercontent.com:sub": "repo:nextstrain/* to allow all Nextstrain repos to assume the role.
@joverlee521 joverlee521 added the enhancement New feature or request label May 8, 2024
@joverlee521
Copy link
Contributor Author

Just found prior art for using AWS IAM role + inline session policy for per-repo accession privileges: https://github.com/Skyscanner/gha-aws-oidc-sample

Important note on requiring reusable workflows, along with the note in the GH Action OIDC docs:

Note: When the organization template is applied, it will not affect any workflows in existing repositories that already use OIDC. For existing repositories, as well as any new repositories that are created after the template has been applied, the repository owner will need to use the REST API to opt in to receive this configuration. Alternatively, the repository owner could use the REST API to apply a different configuration specific to the repository. For more information, see "REST API endpoints for GitHub Actions OIDC."

@joverlee521
Copy link
Contributor Author

joverlee521 commented May 8, 2024

So we would limit the the role to usage of the pathogen-repo-build workflow:

"token.actions.githubusercontent.com:sub": "repo:nextstrain/*:job_workflow_ref:nextstrain/.github/.github/workflows/pathogen-repo-build.yaml@*"

Set the customization template for an OIDC subject claim for an organization

curl -L \
  -X PUT \
  -H "Accept: application/vnd.github+json" \
  -H "Authorization: Bearer <YOUR-TOKEN>" \
  -H "X-GitHub-Api-Version: 2022-11-28" \
  https://api.github.com/orgs/nextstrain/actions/oidc/customization/sub \
  -d '{"include_claim_keys":["repo", "job_workflow_ref"]}'

Then whenever a new pathogen repo wants to use the pathogen-repo-build workflow, opt in to the org's custom claims

curl -L \
  -X PUT \
  -H "Accept: application/vnd.github+json" \
  -H "Authorization: Bearer <YOUR-TOKEN>" \
  -H "X-GitHub-Api-Version: 2022-11-28" \
  https://api.github.com/repos/nextstrain/<pathogen-repo>/actions/oidc/customization/sub \
  -d '{"use_default":false}'

@joverlee521
Copy link
Contributor Author

I don't see why individual repo's shouldn't just automatically opt-in to the org's custom claims.
We could set up a GH Action to just opt-in all nextstrain org repos.

@tsibley
Copy link
Member

tsibley commented May 9, 2024

@joverlee521 Ah, yes, I'd also come to the conclusion to modify the subject claim after we last talked… but didn't communicate it. Sorry! orz but I kinda thought we'd said I was on the hook for the IAM bits and you'd do the GH workflow bits.

  1. import existing IAM policies defining access to our AWS S3 buckets

    • AllowEditingOfNextstrainDataBucket
    • AllowEditingOfNextstrainDataPrivateBucket
    • AllowEditingOfNextstrainNcovPrivateBucket
    • AllowEditingOfNextstrainStagingBucket

I'm not sure, will need to check on them, but we'll want to make sure these are appropriately limited for automation (e.g. no ability to delete object versions, etc.). They were created with individuals in mind, I think.

@joverlee521
Copy link
Contributor Author

I kinda thought we'd said I was on the hook for the IAM bits and you'd do the GH workflow bits.

Yup! This was just my brain dump before I started setting up the role/session interaction within the GH workflow.

joverlee521 added a commit to nextstrain/.github that referenced this issue May 10, 2024
Assumes the role `GitHubActionsRoleNextstrainS3Access`
(just a placeholder name, final name TDB in nextstrain/infra#4)
and scopes repo specific permissions with an inline session policy.

The runtime credentials are then saved in an envdir that is passed
to the build command via `NEXTSTRAIN_RUNTIME_ENVDIRS`.
joverlee521 added a commit to nextstrain/.github that referenced this issue May 10, 2024
Assumes the role `GitHubActionsRoleNextstrainS3Access`
(just a placeholder name, final name TDB in nextstrain/infra#4)
and scopes repo specific permissions with an inline session policy.

The runtime credentials are then saved in an envdir that is passed
to the build command via `NEXTSTRAIN_RUNTIME_ENVDIRS`.
@joverlee521
Copy link
Contributor Author

I have a draft PR for supporting the runtime AWS role/session within pathogen-repo-build. We can coordinate testing once the new role is created.

@tsibley tsibley self-assigned this May 14, 2024
tsibley added a commit that referenced this issue May 15, 2024
tsibley added a commit that referenced this issue May 16, 2024
joverlee521 added a commit to nextstrain/.github that referenced this issue May 16, 2024
Assumes the role `GitHubActionsRoleNextstrainS3Access`
(just a placeholder name, final name TDB in nextstrain/infra#4)
and scopes repo specific permissions with an inline session policy.

The runtime credentials are then saved in an envdir that is passed
to the build command via `NEXTSTRAIN_RUNTIME_ENVDIRS`.

Namespaced `NEXTSTRAIN_RUNTIME_ENVDIR` with `./git/nextstrain` as
suggested by @tsibley in review¹

¹ <#81 (comment)>

pathogen-repo-build:
tsibley added a commit that referenced this issue May 17, 2024
Static role permissions policies which are expected to be
narrowed/scoped-down by an inline session policy set by the
pathogen-repo-build workflow.

Problems with this approach noted in review commentary:

  - <nextstrain/.github#81 (comment)>
  - <nextstrain/.github#81 (comment)>

Resolves: <#4>
Related-to: <nextstrain/.github#81>
tsibley added a commit that referenced this issue May 17, 2024
Let's us map GitHub token claims to principal tags, which is great.
No need to customize the "sub" claim any more, and granular matching.

But ultimately not workable because GitHub OIDC doesn't include the
repository name _sans_ owner in its token claims (e.g. "zika" for
"nextstrain/zika"), so we don't have any policy variable to interpolate
into policy conditions.

I thought maybe we could (ab)use role session names for this and pass
the unqualified repo name via it ourselves, but that hit a wall because
while sts:RoleSessionName is available in the AssumeRoleWithWebIdentity
request, it's not available on subsequent requests which we need to be
able to use in policy conditions.

Then I thought maybe we could use sts:SourceIdentity instead in the role
assumption request which is available as aws:SourceIdentity in
subsequent requests, except that GitHub's token would have to include an
"https://aws.amazonaws.com/source_identity" claim, and it ofc does not
provide any way to do that.  If it did, it would likely also let us
provide a custom "https://aws.amazonaws.com/tags" claim at which point
all of this nonsense would be moot anyway.

Finally, I thought maybe we could use a custom audience ("aud" claim)
which GitHub _does_ let you specify.  This is available in AWS requests
as the "token.actions.githubusercontent.com:aud" key and we can use it
as a policy variable.  \o/  But to use it we'd need to cut out Cognito
Identity (e.g. go back to where we started without it) and lose the
token claims → principal tags mapping.  So we'd need to customize the
GitHub token's "sub" claim and match against it.  We'd also need to
explicitly list out all possible valid "aud" on the AWS OIDC IdP
configuration (as "client_id_list"), and that would be frustrating.

This all feels more complex than it's worth at this point.

Resolves: <#4>
Related-to: <nextstrain/.github#81>
tsibley added a commit to nextstrain/.github that referenced this issue May 17, 2024
Assumes repo-specific roles, `GitHubActionsRoleNextstrainBuild@zika` for
example, which are managed by the Terraform configuration in the
nextstrain/infra repo.

The runtime credentials are then saved in an envdir that is passed
to the build command via `NEXTSTRAIN_RUNTIME_ENVDIRS`.

Namespaced `NEXTSTRAIN_RUNTIME_ENVDIR` with `./git/nextstrain` as
suggested by @tsibley in review¹

¹ <#81 (comment)>

Related-to: <nextstrain/infra#4>
Co-authored-by: Thomas Sibley <[email protected]>
tsibley added a commit that referenced this issue May 20, 2024
tsibley added a commit that referenced this issue May 20, 2024
tsibley added a commit that referenced this issue May 20, 2024
tsibley added a commit to nextstrain/.github that referenced this issue May 20, 2024
Assumes repo-specific roles, `GitHubActionsRoleNextstrainBuild@zika` for
example, which are managed by the Terraform configuration in the
nextstrain/infra repo.

The runtime credentials are then saved in an envdir that is passed
to the build command via `NEXTSTRAIN_RUNTIME_ENVDIRS`.

Namespaced `NEXTSTRAIN_RUNTIME_ENVDIR` with `./git/nextstrain` as
suggested by @tsibley in review¹

¹ <#81 (comment)>

Related-to: <nextstrain/infra#4>
Co-authored-by: Thomas Sibley <[email protected]>
tsibley added a commit that referenced this issue May 20, 2024
tsibley added a commit that referenced this issue May 20, 2024
tsibley added a commit to nextstrain/.github that referenced this issue May 20, 2024
Assumes repo-specific roles, `GitHubActionsRoleNextstrainRepo@zika` for
example, which are managed by the Terraform configuration in the
nextstrain/infra repo.  The repo here is always the _calling_
repository, regardless of if the "repo" input was provided to use
workflows from another place.

The runtime credentials are then saved in an envdir that is passed
to the build command via `NEXTSTRAIN_RUNTIME_ENVDIRS`.

Namespaced `NEXTSTRAIN_RUNTIME_ENVDIR` with `./git/nextstrain` as
suggested by @tsibley in review¹

¹ <#81 (comment)>

Related-to: <nextstrain/infra#4>
Co-authored-by: Thomas Sibley <[email protected]>
tsibley added a commit to nextstrain/.github that referenced this issue May 20, 2024
Assumes repo-specific roles, `GitHubActionsRoleNextstrainRepo@zika` for
example, which are managed by the Terraform configuration in the
nextstrain/infra repo.  The repo here is always the _calling_
repository, regardless of if the "repo" input was provided to use
workflows from another place.

The runtime credentials are then saved in an envdir that is passed
to the build command via `NEXTSTRAIN_RUNTIME_ENVDIRS`.

Namespaced `NEXTSTRAIN_RUNTIME_ENVDIR` with `./git/nextstrain` as
suggested by @tsibley in review¹

¹ <#81 (comment)>

Related-to: <nextstrain/infra#4>
Co-authored-by: Thomas Sibley <[email protected]>
tsibley added a commit to nextstrain/.github that referenced this issue May 20, 2024
Assumes repo-specific roles, `GitHubActionsRoleNextstrainRepo@zika` for
example, which are managed by the Terraform configuration in the
nextstrain/infra repo.  The repo here is always the _calling_
repository, regardless of if the "repo" input was provided to use
workflows from another place.

The runtime credentials are then saved in an envdir that is passed
to the build command via `NEXTSTRAIN_RUNTIME_ENVDIRS`.

Namespaced `NEXTSTRAIN_RUNTIME_ENVDIR` with `./git/nextstrain` as
suggested by @tsibley in review¹

¹ <#81 (comment)>

Related-to: <nextstrain/infra#4>
Co-authored-by: Thomas Sibley <[email protected]>
tsibley added a commit that referenced this issue May 21, 2024
A collection of templated repo-specific roles for inside a Nextstrain
runtime and one role for the GitHub Actions job itself, i.e. outside the
runtime.

The inside-the-runtime roles are given pathogen-specific permissions
necessary for the ingest and phylogenetic workflows of a pathogen repo.

The outside-the-runtime role is only necessary/used at the moment for
access to AWS Batch.

The roles can only be assumed by specific repos when performed by our
centralized pathogen-repo-build workflow.  While this doesn't completely
prevent off-label use by other GitHub Actions workflows launched from a
pathogen repo, it does make it more involved to do so, hopefully to the
point of discouragement.  The associated GitHub repository configuration
is managed by Terraform now as well since the customization of the "sub"
claim in GitHub Action's OIDC token is tightly coupled to our AWS role
trust policies.

Resolves: <#4>
Related-to: <nextstrain/private#96>,
            <nextstrain/.github#81>
Supersedes: <#6>,
            <#7>
tsibley added a commit to nextstrain/.github that referenced this issue May 21, 2024
Assumes repo-specific roles, `GitHubActionsRoleNextstrainRepo@zika` for
example, which are managed by the Terraform configuration in the
nextstrain/infra repo.  The repo here is always the _calling_
repository, regardless of if the "repo" input was provided to use
workflows from another place.

The runtime credentials are then saved in an envdir that is passed
to the build command via `NEXTSTRAIN_RUNTIME_ENVDIRS`.

Namespaced `NEXTSTRAIN_RUNTIME_ENVDIR` with `./git/nextstrain` as
suggested by @tsibley in review¹

¹ <#81 (comment)>

Related-to: <nextstrain/infra#4>
Co-authored-by: Thomas Sibley <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment