Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduled trigger misses the next execution date during flow import #5463

Closed
yuri1969 opened this issue Oct 15, 2024 · 7 comments
Closed

Scheduled trigger misses the next execution date during flow import #5463

yuri1969 opened this issue Oct 15, 2024 · 7 comments
Assignees
Labels
area/backend Needs backend code changes bug Something isn't working kind/customer-request Requested by one or more customers

Comments

@yuri1969
Copy link
Contributor

Describe the issue

We have encountered a rather interesting scheduler behavior during regular operation.

TL;DR

The scheduler does not trigger an execution when the scheduler flow definition gets upgraded via the flow namespace update CLI command issued at the time of the execution.


Description

It seems the trigger scheduler gets interrupted by a deployment of new flows. In our environment there are multiple flow definition deployments happening daily via CI/CI using the flow namespace update CLI command.

It is rather unacceptable to miss triggers scheduled to run daily or even weekly in production.

Example

Output of import CLI command issued at 09:53:59,111:

❯ ./kestra-0.20.0-SNAPSHOT flow namespace update 'company.team' '/home/myuser/flow_imports'
2024-10-15 09:53:59,111 INFO  update       i.kestra.core.plugins.PluginScanner Registered 79 core plugins (scan done in 97ms)
101 flow(s) for namespace 'company.team' successfully updated !
- company.team.myflow-37
- company.team.myflow-46
- company.team.myflow-42
...

Execution history of one of the imported flows is missing the execution which would be triggered at 09:54:

screenshot-2


Reproduction steps

First of all, the reproduction rate not 100% since it relies on timing.

  1. Create a test dir flow_imports
  2. Create initial test flow definition file flow_imports/myflow.yaml
  3. Paste following flow & scheduler definition into the flow_imports/myflow.yaml file:
id: myflow
namespace: company.team

tasks:
  - id: hello
    type: io.kestra.plugin.core.log.Log
    message: Hello World! 🚀

  - id: hello-2
    type: io.kestra.plugin.core.log.Log
    message: Hello World! 🚀

  - id: hello-3
    type: io.kestra.plugin.core.log.Log
    message: Hello World! 🚀

  - id: hello-4
    type: io.kestra.plugin.core.log.Log
    message: Hello World! 🚀

  - id: hello-5
    type: io.kestra.plugin.core.log.Log
    message: Hello World! 🚀

  - id: hello-6
    type: io.kestra.plugin.core.log.Log
    message: Hello World! 🚀

  - id: hello-7
    type: io.kestra.plugin.core.log.Log
    message: Hello World! 🚀

  - id: hello-8
    type: io.kestra.plugin.core.log.Log
    message: Hello World! 🚀

  - id: hello-9
    type: io.kestra.plugin.core.log.Log
    message: Hello World! 🚀

  - id: hello-10
    type: io.kestra.plugin.core.log.Log
    message: Hello World! 🚀

triggers:
  - id: each-min
    type: io.kestra.plugin.core.trigger.Schedule
    cron: "* * * * *"
  1. Generate 100 additional copies of the flow (to make the import duration realistic) using the following generator script:
#!/bin/bash

INPUT='myflow.yaml'

if [[ ! -f "$INPUT" ]]; then
    echo "The initial flow file '${INPUT}' not found" >&2
    exit 1
fi

for I in {1..100}; do
    COPY="myflow-${I}.yaml"

    # shellcheck disable=SC2016
    cp "$INPUT" "$COPY" && yq -e -y -i --arg date " $(date '--iso-8601=seconds')" --arg 'id' "myflow-${I}" '.id = $id | .tasks[0].message += $date' "$COPY"
done
  1. Start the local Kestra server via ./kestra server local
  2. Import all the flow definitions via the ./kestra flow namespace update 'company.team' '<path_to>/flow_imports' CLI command
  3. Pick one of the generated flows, such as company.team.myflow-14, and watch its Executions using the UI
  4. Update the flow definitions by using the generator script again
  5. Since the trigger runs each minute at "00 seconds", wait till ~57th second within the current minute (e.g. 12:13:57 or 10:34:57)
  6. Run the import CLI command again
  7. In the UI observe whether the picked Execution gets triggered or not

Environment

  • Kestra Version: v0.20.0-SNAPSHOT - a95d317
  • Operating System (OS/Docker/Kubernetes): Debian
  • Java Version (if you don't run kestra in Docker): 21 Temurin
@yuri1969 yuri1969 added the bug Something isn't working label Oct 15, 2024
@github-project-automation github-project-automation bot moved this to Backlog in Issues Oct 15, 2024
@MilosPaunovic MilosPaunovic added the area/backend Needs backend code changes label Oct 15, 2024
@17297781Karthik
Copy link
Contributor

Hello ,Can I try to work on this issue.

@johnkm516
Copy link

Can confirm I can reproduce this on 0.19.2.

@anna-geller anna-geller added the kind/customer-request Requested by one or more customers label Nov 18, 2024
@anna-geller
Copy link
Member

2 customers asking for this, bumped to P0 — @Ben8t can you validate the solution with Ludo and assign some resources to make sure this will be part of 0.20?

@Skraye
Copy link
Member

Skraye commented Nov 20, 2024

Hey @yuri1969, I didn't successfully reproduce your issue, but I may have a fix for it.

Our trigger comparison was wrong, and when a trigger is considered edited, we re-create it, setting its next execution date to null
When it was being evaluated after the second 0, but reset before it, the current minute was then being skipped

Once it merged, have you the capability to try your reproducer on the develop image ?

@yuri1969
Copy link
Contributor Author

@Skraye I've attempted to replicate the issue on develop. I couldn't replicate it in over 15 attempts.

Maybe @johnkm516 can confirm my findings.

@johnkm516
Copy link

I think it's been fixed sometime between 0.19.2 and 0.19.10.

I don't know if this is relevant, or if #6018 fixes this issue. But triggers seem to take a long time to react to flow changes when a lot of changes are made at once. For example, if I delete 10 flows with triggers, it'll still show in the Administration / Triggers page, but with the enable / disable UI replaced with "this flow no longer exists" icon. It'll take upwards of 20~30 minutes for the triggers to update.

For deleted flows this is fine, but the problem is when importing 10+ flows. After importing flows, if you go to the Administration / Triggers page, the triggers either won't exist, or it'll exist but the enable / disable switch won't exist and the trigger won't trigger. Again this takes up to 20-30 minutes to update for 10 flows or so.

@anna-geller
Copy link
Member

Totally understandable, @johnkm516 we have an open issue to resolve that problem of zombie triggers #5998

It seems that the issue can be closed then. Thanks so much for the report and all the follow-up discussions, Jiri and John.

@github-project-automation github-project-automation bot moved this from In review to Done in Issues Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/backend Needs backend code changes bug Something isn't working kind/customer-request Requested by one or more customers
Projects
Status: Done
Development

No branches or pull requests

6 participants