-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Action fails regularly due to ETIMEDOUT and ECONNRESET #40
Comments
Is there anything else needed for a release that I could maybe help with? This makes most of our workflows unusable |
@ZacSweers I believe that you can try out this action from a commit hash. You may want to give that a shot as a stopgap? |
Using ef08c68 appears to resolve things for us. I'd recommend a new 1.x release tag to de-flake things for folks, we were definitely considering dropping this otherwise and I'm not sure how willing folks are to point at a direct sha |
Should be published now as |
Thanks! |
We're still seeing this unfortunately, albeit less often and just as this
|
Here's an example run https://github.com/ZacSweers/MoshiX/pull/128/checks?check_run_id=2921425588 |
This happens pretty consistently across the projects I work on, unfortunately I think we're going to have to remove this action as a result as it's a reliability issue |
Unfortunately, we don't have enough information at this time to understand what's causing this issue. Are you using self-hosted runners, or runners hosted by GH? |
I see this often on GH hosted runners, often in the square/anvil repo |
@eskatos is there any way to add additional log output on failure so that we can work on understanding the root cause? |
Github hosted actions here. When the action fails with this error it fails across all active runs around the same time. About 30 minutes ago 3 runs failed simultaneously. Retried each about 20 minutes ago and they all passed. |
That seems like something that absolutely indicates a cloudflare issue. |
Okay, all this finally sent me down the right path, I think I may have finally figured out what's going on here. It looks like our Cloudflare WAF is being triggered every once and awhile randomly and is causing a bunch of users connections to fail when it does. I need to talk to @eskatos about how we want to mitigate this issue. Thanks everyone for helping us figure out what was going wrong here. |
The fix has been implemented. Please let us know if any of you continue to experience these problems. I hope this will fix the issue, but we have some additional things we can fiddle with if this continues to be a problem. FOR INTERNAL TRACKING (not public): https://github.com/gradle/gradle-private/issues/3435 |
Facing a similar issue. A two-line change to a class causes failures with these actions in the following runs:
|
Seeing the same issue here. https://github.com/MinimallyCorrect/Mixin/runs/4041503110?check_suite_focus=true Can the team publish a single file with all the hashes instead of having it fetch hundreds of files with each hash? This is only going to increase in frequency as the number of requests needed goes up with every release.
|
It's not possible for us to know what version you have locally, so we have to fetch all of them. Ill take a look at our Cloudflare logs and see if this is being caused by our infrastructure/firewall. Thanks for the ping 🙂 |
Also ran into this right now (and yesterday), re-triggered the job, then it worked:
GitHub hosted action... Let me know if I can provide any more data that helps with this! |
@JLLeitschuh I was thinking adding the checksum inline to {
"version" : "7.3-20211027231204+0000",
"buildTime" : "20211027231204+0000",
"current" : false,
"snapshot" : true,
"nightly" : false,
"releaseNightly" : true,
"activeRc" : false,
"rcFor" : "",
"milestoneFor" : "",
"broken" : false,
"downloadUrl" : "https://services.gradle.org/distributions-snapshots/gradle-7.3-20211027231204+0000-bin.zip",
"checksumUrl" : "https://services.gradle.org/distributions-snapshots/gradle-7.3-20211027231204+0000-bin.zip.sha256",
"wrapperChecksumUrl" : "https://services.gradle.org/distributions-snapshots/gradle-7.3-20211027231204+0000-wrapper.jar.sha256",
"wrapperChecksum": "33ad4583fd7ee156f533778736fa1b4940bd83b433934d1cc4e9f608e99a6a89"
// (The checksum would actually be shorter than the URL for where to go fetch it. ;))
}, Since the only field that gets used at the moment is the wrapper checksum, it might even be worth making a more specialized endpoint which is just a list of all wrapper checksums. I have no idea where the code that generates/serves these is. |
I've been running into this issue occasionally ever since I integrated this action, but today it's been happening like 60% of the time on macOS on CI (CI also runs on Windows and Linux, but both seem fine). I recently upgraded to Gradle 7, in case that's relevant. |
To mitigate gradle/wrapper-validation-action#40 Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
So, I've checked, and it's not our WAF causing theses issues. I'm not certain what would be causing these issues otherwise. |
Running into the same issue today. Any updates on this? |
also seeing this issue a few times every day, retrying tends to work straight away |
Seeing this a lot on Paparazzi builds, mainly with Windows workers |
Is there any further update on this as it keeps failing sporadically on both windows-2022 and ubuntu-20.04 action runs. |
I also keep getting CI failures due to this issue: |
We do have retry logic enabled.
That being said, I have no evidence that it's actually working. A PR to improve debug logging from the community would be welcomed openly. Especially if it were implemented such that the additional logging was only printed when the build was going to fail anyways. I'd prefer to not make the action more chatty than it needs to be when it's not going to fail. I think the biggest problem we currently have is a severe lack of visibility. As such it makes it really difficult to figure out a root cause for these issues. |
still happens to this day, I've been unable to have a valid build after multiple retries :( |
Are you running this on self-hosted runners, or is this running on GitHub's infrastructure? |
it was on GH infra, but it may have been an older fixed version of the action that suddenly started failing. Upgraded from Out of curiosity, why is the Action making any request at all, wouldn't it make sense to store the valid hash-version pairs inside the action itself and only hit network if a local entry doesn't exist? |
I have a feeling the action can/should be modified to vastly reduce the number of calls to services.gradle.org. The results could be cached in the GitHub Actions cache, and details of known-good versions could be possibly be bundled directly in the action itself (so remote calls would only be required for Gradle versions released after the last wrapper release. This issue isn't being actively worked on at this time. |
@bigdaz it makes a lot of sense |
The only thing I would be slightly concerned about is ensuring that it is impossible for an attacker to tamper with the cache prior to this action running. |
Is there a reason for gradle not to add the checksums inline in https://services.gradle.org/versions/all instead of requiring fetching a separate file per checksum? |
This should be fixed by #167 |
Example runs:
https://github.com/square/anvil/pull/266/checks?check_run_id=2589215352
https://github.com/square/anvil/pull/266/checks?check_run_id=2589215611
I've seen this flakey behavior happen somewhat often in the past few weeks, not sure what else is going on so filing this as an FYI.
The text was updated successfully, but these errors were encountered: