[DISCUSS] Handling duplicate calls to provision API #445

dbwiddis · 2024-01-24T18:20:55Z

Is your feature request related to a problem?

If a user calls the provision API twice using the same workflow, its provisioned resources are duplicated. This can be confusing as the combination of workflow step name and workflow step ID is no longer unique, and choosing the correct resource ID requires guesswork. Also the state changes to COMPLETED when the first workflow finishes, even though more resources are being added.

Additionally this results in the additional copy (or copies) consuming cluster resources.

What solution would you like?

Prevent a workflow from being provisioned if its status is PROVISIONING. This status may need to be checked multiple times (like double-checked-locking) to prevent the race condition where two provision calls both see NOT_STARTED before one of them begins provisioning.

While addressing this, we also should discuss how to handle FAILED. We could prevent provisioning and require users to use the deprovision API first (basically only allow provisioning from the NOT_STARTED state) or we could try to "pick up from where we left off", adding logic to skip provisioining steps if the resource already exists (complex and brittle).

What alternatives have you considered?

Leaving the API as is, and/or encouraging use of the create API with the provisioning parameter, which does not have this shortcoming.

Do you have any additional context?

We should probably handle this similarly to trying to create the same Anomaly Detector twice.

The text was updated successfully, but these errors were encountered:

joshpalis · 2024-01-24T18:27:46Z

Guard rails are a good idea, I think we should block another provision call if the status is PROVISIONING, COMPLETED, or FAILED and only allow provisioning from a NOT_STARTED state. This is somewhat similar behavior implemented for the update API in #416

owaiskazi19 · 2024-01-24T19:11:08Z

The solution looks good to check for the status for the previous provision.

This status may need to be checked multiple times

Trying to understand more here. Do we have to build a functionality like what we have for retry currently to keep checking for the status or a cron job of our own?

While addressing this, we also should discuss how to handle FAILED

If the status of the previous provision is FAILED. We can perform either options:

Let the user know the previous provision was failed for the workflowId, run the deprovision API manually. This would eventually delete the workflowID if deprovisioning is successful. There can be a case where deprovisioning itself failed, then this might result in a deadlock condition.
We can perform deprovisioning on our end, but again we might face the above issue. User has to perform a manual deprovisioning in such cases based on our log message Failed to deprovision some resources.

I am inclined towards option 1.

dbwiddis · 2024-01-25T00:09:38Z

Trying to understand more here. Do we have to build a functionality like what we have for retry currently to keep checking for the status or a cron job of our own?

No, it's not about retries it's merely to catch a race condition.

Thread A checks status as not started, moves on to next step.
Thread B checks status as not started, moves on to next step.
Thread A changes status to provisioning.
Thread B continues and results in a duplicate.

But relying on the status won't work as once you change it to provisioning... so we really need some other way to "lock" a thread for provisioning.

dbwiddis added enhancement New feature or request untriaged discuss labels Jan 24, 2024

minalsha removed the untriaged label Jan 24, 2024

dbwiddis self-assigned this Jan 26, 2024

dbwiddis mentioned this issue Jan 29, 2024

Prevent provisioning an already-provisioned workflow #466

Merged

dbwiddis closed this as completed in #466 Jan 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DISCUSS] Handling duplicate calls to provision API #445

[DISCUSS] Handling duplicate calls to provision API #445

dbwiddis commented Jan 24, 2024 •

edited

Loading

joshpalis commented Jan 24, 2024

owaiskazi19 commented Jan 24, 2024 •

edited

Loading

dbwiddis commented Jan 25, 2024

[DISCUSS] Handling duplicate calls to provision API #445

[DISCUSS] Handling duplicate calls to provision API #445

Comments

dbwiddis commented Jan 24, 2024 • edited Loading

Is your feature request related to a problem?

What solution would you like?

What alternatives have you considered?

Do you have any additional context?

joshpalis commented Jan 24, 2024

owaiskazi19 commented Jan 24, 2024 • edited Loading

dbwiddis commented Jan 25, 2024

dbwiddis commented Jan 24, 2024 •

edited

Loading

owaiskazi19 commented Jan 24, 2024 •

edited

Loading