Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v3.5.5: EKS Pod Identity S3 Artifact is not working #12949

Closed
3 of 4 tasks
rafilkmp3 opened this issue Apr 17, 2024 · 4 comments
Closed
3 of 4 tasks

v3.5.5: EKS Pod Identity S3 Artifact is not working #12949

rafilkmp3 opened this issue Apr 17, 2024 · 4 comments
Labels
area/artifacts S3/GCP/OSS/Git/HDFS etc solution/outdated This is not up-to-date with the current version type/support User support issue - likely not a bug

Comments

@rafilkmp3
Copy link

rafilkmp3 commented Apr 17, 2024

Pre-requisites

  • I have double-checked my configuration
  • I can confirm the issue exists when I tested with :latest
  • I have searched existing issues and could not find a match for this bug
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what did you expect to happen?

I have the following configmap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: workflows-artifact-repository
  namespace: workflows
data:
  v2-s3-artifact-repository: |
    s3:
      bucket: redacted-prod-artifacts
      endpoint: s3.amazonaws.com
      region: us-east-2
      useSDKCreds: true

The service account is already set up correctly and the minio client is assuming the correct role with more than enough permissions to S3. Also, the role already have the Trust Relationship as:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "pods.eks.amazonaws.com"
            },
            "Action": [
                "sts:TagSession",
                "sts:AssumeRole"
            ]
        }
    ]
}

Version

v3.5.5

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

spec:
  workflowSpec:
    templates:
      - name: main
        inputs: {}
        outputs: {}
        nodeSelector:
          kubernetes.io/arch: amd64
        metadata: {}
        steps:
          - - name: generate-data
              template: generate-data
              arguments:
                parameters:
                  - name: lookback_hours
                    value: '1'

and this is the template used:

      - name: generate-data
        inputs:
          parameters:
            - name: lookback_hours
              value: '1'
        outputs:
          artifacts:
            - name: interactions-csv
              path: /app/interactions.csv
            - name: items-csv
              path: /app/items.csv
            - name: users-csv
              path: /app/users.csv
        nodeSelector:
          kubernetes.io/arch: amd64
        metadata: {}
        script:
          name: ''
          image: >-
            redacted.dkr.ecr.us-east-2.amazonaws.com/personalize-updater:latest
          command:
            - bash
          resources:
            limits:
              memory: 12Gi
            requests:
              cpu: '2'
              memory: 12Gi
          source: >
            set -exu

            poetry run python -m src.main --lookback_hours
            {{inputs.parameters.lookback_hours}}
        serviceAccountName: personalize-updater-serviceaccount
        podSpecPatch: >-
          {"containers":[{"name":"wait","resources":{"limits":{"cpu":"{{workflow.parameters.cpu-limit}}","memory":"{{workflow.parameters.mem-limit}}"}}}]}

Logs from the workflow controller

I believe it is irrelevant for that case, the controller logs are only showing there is nothing related to the issue.

Logs from in your workflow's wait container

│ time="2024-04-17T19:35:44.681Z" level=info msg="Starting Workflow Executor" version=v3.5.5                                                                                                                   │
│ time="2024-04-17T19:35:44.683Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5                                                                                     │
│ time="2024-04-17T19:35:44.683Z" level=info msg="Executor initialized" deadline="0001-01-01 00:00:00 +0000 UTC" includeScriptOutput=false namespace=workflows podName=personalize-hourly-updater-2gpmv-genera │
│ te-data-3777018436 templateName=generate-data version="&Version{Version:v3.5.5,BuildDate:2024-02-29T20:59:20Z,GitCommit:c80b2e91ebd7e7f604e88442f45ec630380effa0,GitTag:v3.5.5,GitTreeState:clean,GoVersion: │
│ go1.21.7,Compiler:gc,Platform:linux/amd64,}"                                                                                                                                                                 │
│ time="2024-04-17T19:35:44.696Z" level=info msg="Starting deadline monitor"                                                                                                                                   │
│ time="2024-04-17T19:36:08.706Z" level=info msg="Main container completed" error="<nil>"                                                                                                                      │
│ time="2024-04-17T19:36:08.706Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"                                                                                     │
│ time="2024-04-17T19:36:08.706Z" level=info msg="No output parameters"                                                                                                                                        │
│ time="2024-04-17T19:36:08.706Z" level=info msg="Saving output artifacts"                                                                                                                                     │
│ time="2024-04-17T19:36:08.706Z" level=info msg="Staging artifact: interactions-csv"                                                                                                                          │
│ time="2024-04-17T19:36:08.706Z" level=info msg="Copying /app/interactions.csv from container base image layer to /tmp/argo/outputs/artifacts/interactions-csv.tgz"                                           │
│ time="2024-04-17T19:36:08.706Z" level=info msg="/var/run/argo/outputs/artifacts/app/interactions.csv.tgz -> /tmp/argo/outputs/artifacts/interactions-csv.tgz"                                                │
│ time="2024-04-17T19:36:08.706Z" level=info msg="S3 Save path: /tmp/argo/outputs/artifacts/interactions-csv.tgz, key: personalize-hourly-updater-2gpmv/personalize-hourly-updater-2gpmv-generate-data-3777018 │
│ 436/interactions-csv.tgz"                                                                                                                                                                                    │
│ time="2024-04-17T19:36:08.714Z" level=info msg="Creating minio client using assumed-role credentials" roleArn="arn:aws:iam::redacted:role/prod-cluster-v11-personalize-20240411174459608400000001"       │
│ 2024/04/17 19:36:08 Ignoring, HTTP credential provider invalid endpoint host, "169.254.170.23", only loopback hosts are allowed. <nil>                                                                       │
│ time="2024-04-17T19:36:08.778Z" level=warning msg="Non-transient error: NoCredentialProviders: no valid providers in chain. Deprecated.\n\tFor verbose messaging see aws.Config.CredentialsChainVerboseError │
│ s"                                                                                                                                                                                                           │
│ time="2024-04-17T19:36:08.778Z" level=info msg="Save artifact" artifactName=interactions-csv duration=72.185513ms error="failed to create new S3 client: NoCredentialProviders: no valid providers in chain. │
│  Deprecated.\n\tFor verbose messaging see aws.Config.CredentialsChainVerboseErrors" key=personalize-hourly-updater-2gpmv/personalize-hourly-updater-2gpmv-generate-data-3777018436/interactions-csv.tgz      │
│ time="2024-04-17T19:36:08.778Z" level=error msg="executor error: failed to create new S3 client: NoCredentialProviders: no valid providers in chain. Deprecated.\n\tFor verbose messaging see aws.Config.Cre │
│ dentialsChainVerboseErrors"                                                                                                                                                                                  │
│ time="2024-04-17T19:36:08.802Z" level=info msg="Alloc=9319 TotalAlloc=16464 Sys=23653 NumGC=4 Goroutines=8"                                                                                                  │
│ time="2024-04-17T19:36:08.810Z" level=fatal msg="failed to create new S3 client: NoCredentialProviders: no valid providers in chain. Deprecated.\n\tFor verbose messaging see aws.Config.CredentialsChainVer │
│ boseErrors"
@agilgur5 agilgur5 added the area/artifacts S3/GCP/OSS/Git/HDFS etc label Apr 17, 2024
@agilgur5
Copy link
Contributor

agilgur5 commented Apr 17, 2024

v3.5.5

#12651 (the fix for #12650 that you commented on) was not backported to 3.5.x.
You could test it with the :latest image

  • I can confirm the issue exists when I tested with :latest

v3.5.5 is not :latest, it is the latest stable.

@agilgur5 agilgur5 added type/support User support issue - likely not a bug and removed type/bug labels Apr 17, 2024
@agilgur5 agilgur5 changed the title EKS Pod Identity S3 Artifact is not working EKS Pod Identity S3 Artifact is not working on 3.5.5 Apr 17, 2024
@agilgur5 agilgur5 added the problem/more information needed Not enough information has been provide to diagnose this issue. label Apr 17, 2024
@rafilkmp3
Copy link
Author

v3.5.5

#12651 (the fix for #12650 that you commented on) was not backported to 3.5.x. You could test it with the :latest image

  • I can confirm the issue exists when I tested with :latest

v3.5.5 is not :latest, it is the latest stable.

Thank you, it works on :latest, I was using 3.5.5 that was the latest stable release at the time. Do you know if this is arriving on latest stable any time soon? I see that 3.5.6 was released recently but doesn't include that change and I'd like to keep a defined tag instead of :latest.

@agilgur5
Copy link
Contributor

We generally follow this doc: https://argo-workflows.readthedocs.io/en/latest/releases/

#12651 wasn't a security patch, so it didn't get backported. Unless it gets backported, it won't be released till 3.6.
We are just about on the next minor release cycle, so I asked about starting it in the last Contributor Meeting with alphas since 3.5 is still buggy/unstable (primarily #12025 and related due to #11121). It's currently deferred due to the 3.5 bugginess. As such I imagine a 3.6 RC and then stable won't be out for a few months at least.

You can build your own custom images with that cherry-picked into v3.5.6, for instance. Or you can use the commit hash instead of :latest if you're fine with a dev build.

@agilgur5
Copy link
Contributor

Thank you, it works on :latest

Closing since it does work as intended.

Also, since you got it working, would you be interested in documenting it yourself? Since it's still missing docs (and I haven't used it so I can't write it myself).

@agilgur5 agilgur5 added solution/invalid This is incorrect. Also can be used for spam solution/outdated This is not up-to-date with the current version and removed problem/more information needed Not enough information has been provide to diagnose this issue. solution/invalid This is incorrect. Also can be used for spam labels Apr 24, 2024
@argoproj argoproj locked as resolved and limited conversation to collaborators Sep 20, 2024
@agilgur5 agilgur5 changed the title EKS Pod Identity S3 Artifact is not working on 3.5.5 v3.5.5: EKS Pod Identity S3 Artifact is not working Oct 8, 2024
@agilgur5 agilgur5 added this to the v3.6.0 milestone Oct 8, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/artifacts S3/GCP/OSS/Git/HDFS etc solution/outdated This is not up-to-date with the current version type/support User support issue - likely not a bug
Projects
None yet
Development

No branches or pull requests

2 participants