Consider previous deployments as successful #6224
Replies: 9 comments 6 replies
-
@Startrekzky Need your input. Thanks |
Beta Was this translation helpful? Give feedback.
-
Hi @VictorChacon-Ada , thank you for the feedback.
What do you mean by
If the previous deployment is d-1, the later deployment is d-2. As far as I know,
Hence, I'm confused why your change lead time has been affected. cc. @klesh |
Beta Was this translation helpful? Give feedback.
-
Hello guys, thanks for the help! The metric I was talking about was the DORA Change Failure Rate, but I had gotten to the conclusion that these deployments were considered failures by querying the data on SQL. On checking my grafana dashboards, that is not true (I was probably missing a step when querying manually), so we can ignore that first point. For the second point, it is true that the d-2 deployment does not affect the d-1 lead time. But exactly for that reason, we get the situation where one of the deployments has a short Lead Time, and another has a very long Lead Time. If I get enough of these "superseded" deployments in a project in a month, I get a median Lead Time of 721 hours, a little bit over the 30 days when the deployment times out. I have an example of a timed out deployment that affects the lead time. If I run the query below on the SQL backend, I get that result. SELECT
pr.id as pr_id,
pr.status as pr_status,
pr.merge_commit_sha as pr_merge_commit_sha,
pr.base_commit_sha as pr_base_commit_sha,
pr.head_commit_sha as pr_head_commit_sha,
ppm.first_commit_sha as ppm_first_commit_sha,
prc.commit_sha as prc_commit_sha,
ppm.pr_coding_time as coding_time,
ppm.deployment_commit_id as ppm_deployment_commit_id,
ppm.pr_deploy_time as deploy_time,
ppm.pr_cycle_time as cycle_time,
cdc.`result` as cdc_result,
cdc.status as cdc_status,
pr.merged_date as pr_merged_date,
cdc.finished_date as cdc_finished_date,
cdc.duration_sec as cdc_duration_sec
FROM
pull_requests pr
INNER JOIN project_pr_metrics ppm ON ppm.id = pr.id
INNER JOIN pull_request_commits prc ON prc.pull_request_id = pr.id
INNER JOIN cicd_deployment_commits cdc ON cdc.commit_sha = pr.merge_commit_sha
WHERE pr.id = "github:GithubPullRequest:1:1475009734"
As you can see, the fact that the cdc duration is so long, ends up on the pr_cycle_time. From the dora metric plugin code, I can see that the deployment is being used to calculate the Lead Time. Maybe I have some inconsistencies in my data that are causing stuff like that to not be filtered, but as it stands, the median lead time for changes metric is being polluted by the 721 hours deployments, and in the worst cases, completely overtaken, making the median 721 hours. I appreciate any help I can get in understanding this problem. |
Beta Was this translation helpful? Give feedback.
-
So, deployment commit with id |
Beta Was this translation helpful? Give feedback.
-
Yes, it is considered a failed deployment, but practically, it was just superseded by a later deployment that took the changes together. That is the main problem. In the best possible cenario, I'd like to be able to consider the deployment for |
Beta Was this translation helpful? Give feedback.
-
It might be caused by the status not being processed properly, similar to #6233 |
Beta Was this translation helpful? Give feedback.
-
Hi, @VictorChacon-Ada I found another possible cause and it should be fixed by #6270 |
Beta Was this translation helpful? Give feedback.
-
@VictorChacon-Ada We have released two versions(https://github.com/apache/incubator-devlake/releases/tag/v0.18.1-beta2 or https://github.com/apache/incubator-devlake/releases/tag/v0.19.0-beta5), you can upgrade to one of them to see if the problem is solved. |
Beta Was this translation helpful? Give feedback.
-
That seems to have fixed it. Thank you all so much for the help! |
Beta Was this translation helpful? Give feedback.
-
Hi there. We currently have a workflow running on github actions that deploys to a staging environment (actually only pushes an image, and argocd does the rest), and then awaits manual approval to deploy to production.
The thing is, if multiple people are making changes to the same repo, and testing on our staging environment, sometimes a later deployment process is approved, and takes the previous changes with it. Then, we never actually approve the previous deployment, and it ends up failing after the 30 days timeout.
This ends up causing 2 things:
Below, I have 2 screeshots, 1 for a pipeline that is currently waiting for deployment, but has been superseded, and another one where the same thing happened, but timed out after 30 days.
Is there any way I can treat these specific cases on devlake? I tried reading the code for the lead time calculation in the github connector, but couldn't think of any way to reconcile the data. If not, any suggestions on how I can calculate these Dora Metrics accurately?
Beta Was this translation helpful? Give feedback.
All reactions