-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release-v0.50.x] Regression: fix results with out of order tasks #7173
[release-v0.50.x] Regression: fix results with out of order tasks #7173
Conversation
The pipeline run reconciler builds a pipeline run state on every run, which resolves task references, expands result and processes matrix fan outs. The current process is incremental in a single loop, where each new PipelineTask resolution depends on the state of PipelineTasks resolved before. This is problematic because tasks are not necessarily defined in the pipeline in order of execution (which is undefined, given that pipelines are DAGs). Since this PR is a fix to a regression, it aims to be as minimal as possible. The smallest solution available is to implement some sorting in the list of tasks, so that the incremental state can work correctly. This PR splits the process into two runs, one for tasks that have been already started (and possibly completed), and a second one that includes all remaining tasks. The first group of task does not need matrix fan outs (they have already been processed) or result resolution, so its state can be safely build incrementally. The second group is executed starting from the state of the second group. Any task that is a candidate for execution in this this reconcile cycle must have its results resolved through the state of the first group. Testing with the current code arrangement is a bit challenging, as we ignore result resolution errors in the code, which is ok only in some cases: - result resolution due to task not found or result not defined is permanent and should not be ignored - result resolution due to a result not being available yet is ephemeral (possibly) and should not cause a failure Currently one function checks for all these conditions and returns one error, so it's not possible to safely distinguish them. This will require some refactoring to be fixed in a follow up patch. For now, a reconcile unit test can test the fix. Fixes: tektoncd#7103 Signed-off-by: Andrea Frittoli <[email protected]>
/kind bug |
The following is the coverage report on the affected files.
|
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vdemeester The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The following is the coverage report on the affected files.
|
/lgtm |
This is an automated cherry-pick of #7169
/assign afrittoli