Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(operator.py): Increased timeout for get_mgr_backup_task method #8475

Merged

Conversation

grzywin
Copy link
Contributor

@grzywin grzywin commented Aug 28, 2024

Some of the longevity tests for the Operator (e.g., longevity-scylla-operator-3h-eks-backup-test) are failing because they exceed the 300 second timeout set in the get_mgr_backup_task method. After investigation, I noticed that this operation can take up to approximately 8 minutes, so I increased the timeout to 10 minutes. Link to successful run after increasing timeout here

@grzywin grzywin changed the title fix(operator.py) Increased timeout for get_mgr_backup_task method fix(operator.py): Increased timeout for get_mgr_backup_task method Aug 28, 2024
@grzywin grzywin force-pushed the operator-increase-get-mgr-backup-task-timeout branch from da9f3f6 to f8e5b14 Compare August 28, 2024 08:16
@soyacz
Copy link
Contributor

soyacz commented Aug 28, 2024

Isn't it something to raise in the operator side - why creating a backup task takes 8 minutes?

@grzywin
Copy link
Contributor Author

grzywin commented Aug 28, 2024

This is something they are already aware of as they were running the same tests for 1.13 release and had this timeout.
@zimnx put a comment about it here ("Discussion" tab) and marked tests as passed.

Isn't that disrupt_mgmt_backup nemesis is causing the backup taking more time?

@grzywin grzywin marked this pull request as ready for review August 28, 2024 08:54
@grzywin grzywin removed their assignment Aug 28, 2024
@mikliapko mikliapko requested review from a team and mikliapko August 28, 2024 08:56
Copy link
Contributor

@mikliapko mikliapko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@soyacz
Copy link
Contributor

soyacz commented Aug 28, 2024

This is something they are already aware of as they were running the same tests for 1.13 release and had this timeout. @zimnx put a comment about it here ("Discussion" tab) and marked tests as passed.

Isn't that disrupt_mgmt_backup nemesis is causing the backup taking more time?

@zimnx might not be aware of the timeout value.
I believe status should show up earlier than 5 minutes (verified every 2 secs), worth to investigate what is the bottleneck here.

@zimnx
Copy link

zimnx commented Aug 28, 2024

it's a known issue that creating tasks takes long time. We have multiple issues with improvements (scylladb/scylla-operator#1939). Recent Manager 3.3.1 brought important feature which will help us resolve subset of them.
I'm fine with bumping the timeout until we fix stuff on our end.

Copy link
Contributor

@vponomaryov vponomaryov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@soyacz soyacz merged commit dc71456 into scylladb:master Aug 28, 2024
7 checks passed
@soyacz
Copy link
Contributor

soyacz commented Aug 28, 2024

ugh, I didn't see missing backport labels.
@grzywinski where we need that to be fixed?

@vponomaryov vponomaryov added the backport/none Backport is not required label Aug 28, 2024
@vponomaryov
Copy link
Contributor

ugh, I didn't see missing backport labels. @grzywinski where we need that to be fixed?

We don't have dedicated operator branches. It gets run with SCT master branch only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/none Backport is not required promoted-to-master
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants