Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Invalidate GH action dependency cache when internal nightly dependencies are updated #11748

Open
gerashegalov opened this issue Nov 21, 2024 · 2 comments
Labels
? - Needs Triage Need team to review and classify bug Something isn't working

Comments

@gerashegalov
Copy link
Collaborator

gerashegalov commented Nov 21, 2024

Describe the bug

In GH mvn actions we validate the JVM bytecode builds against supported published Apache Spark builds

The cache key in https://github.com/NVIDIA/spark-rapids/actions/caches for the dependencies just uses the current date at the day granularity. This may miss the most recent spark-rapids-jni and spark-rapids-private dependency worst case by ~24h that the current spark-rapids PR is trying to pick up for a new API

We can use REST API to determine the latest available timestamp and make it part of the cache key

$ curl -s -H "Accept: application/json" 'https://oss.sonatype.org/service/local/artifact/maven/resolve?r=snapshots&g=com.nvidia&a=spark-rapids-jni&v=24.12.0-SNAPSHOT&c=&e=jar&wt=json' | jq .
{
  "data": {
    "presentLocally": true,
    "groupId": "com.nvidia",
    "artifactId": "spark-rapids-jni",
    "version": "24.12.0-20241122.074512-45",
    "baseVersion": "24.12.0-SNAPSHOT",
    "extension": "jar",
    "snapshot": true,
    "snapshotBuildNumber": 45,
    "snapshotTimeStamp": 1732261512000,
    "sha1": "6812e81a38a2ccf256bd57f4c528fc5e695f59ce",
    "repositoryPath": "/com/nvidia/spark-rapids-jni/24.12.0-SNAPSHOT/spark-rapids-jni-24.12.0-20241122.074512-45.jar"
  }
}

$ curl -s -H "Accept: application/json" 'https://oss.sonatype.org/service/local/artifact/maven/resolve?r=snapshots&g=com.nvidia&a=rapids-4-spark-private_2.12&v=24.12.0-SNAPSHOT&c=&e=jar&wt=json' | jq .
{
  "data": {
    "presentLocally": true,
    "groupId": "com.nvidia",
    "artifactId": "rapids-4-spark-private_2.12",
    "version": "24.12.0-20241122.070608-52",
    "baseVersion": "24.12.0-SNAPSHOT",
    "extension": "jar",
    "snapshot": true,
    "snapshotBuildNumber": 52,
    "snapshotTimeStamp": 1732259168000,
    "sha1": "9e5787a1166eb5537ba3cb7c9971d760d80a8e0d",
    "repositoryPath": "/com/nvidia/rapids-4-spark-private_2.12/24.12.0-SNAPSHOT/rapids-4-spark-private_2.12-24.12.0-20241122.070608-52.jar"
  }
}

This will guarantee that we have the latest internal spark-rapids* in the active cache when the user re-runs the GH action after the nightly artifact is published.

@gerashegalov gerashegalov added ? - Needs Triage Need team to review and classify bug Something isn't working labels Nov 21, 2024
@revans2
Copy link
Collaborator

revans2 commented Nov 22, 2024

Do we want to do this for other internal dependencies too? What about the private jar? I don't think we depend on anything else that is a SNAPSHOT release besides spark when we are trying to work on a new shim. And even then waiting 24 hours is probably not a big deal.

@gerashegalov gerashegalov changed the title [BUG] Invalidate GH action dependency cache when spark-rapids-jni nightly is updated [BUG] Invalidate GH action dependency cache when internal nightly dependencies are updated Nov 22, 2024
@gerashegalov
Copy link
Collaborator Author

@revans2 good point: I reworded the issue to generalize so that covers spark-rapids-private. I think Spark SNAPSHOT 24h staleness is acceptable but we should implement it in a way such that it's easy to modify the list of must-be-up-to-date dependencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants