Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DR-Failback errors when gathering/detecting HA VM as well as storage domains #649

Closed

Conversation

shubhaOracle
Copy link
Contributor

DR-Failback errors when gathering/detecting HA VM as well as storage domains
Issue:
Storage domain detection was not finding any storage domains due to incorrect conditional checks.
Failback operation with a Highly Available VM but encountered an Unexpected templating type error occurred on that must be str, not list
Fix:
Created separate tasks for detecting domains in various states and fixed conditional checks.
Removed redundant variable quotes.

Signed-off-by: Shubha Kulkarni [email protected]

@mwperina mwperina force-pushed the sd-detection-during-failback branch from a742dad to 88fab7b Compare December 9, 2022 18:43
@shubhaOracle
Copy link
Contributor Author

May I request a review here?

1 similar comment
@shubhaOracle
Copy link
Contributor Author

May I request a review here?

ovirt_storage_domain_info:
pattern: >
name={{ storage['dr_' + dr_source_map + '_name'] }} and
datacenter={{ storage['dr_' + dr_source_map + '_dc_name'] }} and {{ dr_active_domain_search }}
Copy link
Member

@barpavel barpavel Jan 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix 2 trailing spaces that cause ansible-lint 6 to fail, which breaks the CI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@@ -2,23 +2,37 @@
- name: Remove valid storage domain main block
block:
- name: Fetch active/maintenance/detached storage domain for remove
Copy link
Member

@barpavel barpavel Jan 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is relevant only if the next comment (with not-splitting the original code into 3 tasks) doesn't work...
Fix name from "Fetch active/maintenance/detached" into "Fetch active".
Since the "maintenance" and "detached" are now separate tasks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- name: Fetch active/maintenance/detached storage domain for remove
- name: Fetch active domain for remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@@ -2,23 +2,37 @@
- name: Remove valid storage domain main block
block:
- name: Fetch active/maintenance/detached storage domain for remove
ovirt_storage_domain_info:
pattern: >
Copy link
Member

@barpavel barpavel Jan 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must admit I don't have a vast Ansible experience, but could some "playing" with () be a more simple option to fix the original code instead of splitting the code into 3 tasks with some amount of duplicate code?
Not sure whether the following syntax will work:

        pattern: >
          name={{ storage['dr_' + dr_source_map + '_name'] }} and
          (
            (
              datacenter={{ storage['dr_' + dr_source_map + '_dc_name'] }} and
              (
                {{ dr_active_domain_search }} or {{ dr_maintenance_domain_search }}
              )
            )
            or
            {{ dr_unattached_domain_search }}
          )

If not then let's leave your solution with 3 separate tasks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must admit I don't have a vast Ansible experience, but could some "playing" with () be a more simple option to fix the original code instead of splitting the code into 3 tasks with some amount of duplicate code? Not sure whether the following syntax will work:

        pattern: >
          name={{ storage['dr_' + dr_source_map + '_name'] }} and
          (
            (
              datacenter={{ storage['dr_' + dr_source_map + '_dc_name'] }} and
              (
                {{ dr_active_domain_search }} or {{ dr_maintenance_domain_search }}
              )
            )
            or
            {{ dr_unattached_domain_search }}
          )

If not then let's leave your solution with 3 separate tasks.

Thanks for taking a look at the changes. I would have preferred to have one common condition as well. However, for readability and ease of maintenance sake, I feel my new approach is much simpler. I hope that works for you as well.

@barpavel
Copy link
Member

barpavel commented Jan 10, 2023

May I request a review here?

Sorry for keeping you waiting.
Now I will give this a priority, so this PR will be merged ASAP.
Please see my comments.

@shubhaOracle
Copy link
Contributor Author

May I request a review here?

Sorry for keeping you waiting. Now I will give this a priority, so this PR will be merged ASAP. Please see my comments.

I made the changes based on the feedback. Can you please review/merge?

@barpavel
Copy link
Member

/ost

- name: Fetch active storage domain for remove
ovirt_storage_domain_info:
pattern: >
name={{ storage['dr_' + dr_source_map + '_name'] }} and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 small indentation fix: move 1 character to the left.

auth: "{{ ovirt_auth }}"
register: storage_domain_info_active

- name: Fetch maintenance storage domain for remove
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section (lines 28-34) seems duplicate (same as lines 12-18)?
AFAIU lines 28-34 should be removed.

auth: "{{ ovirt_auth }}"
register: storage_domain_info_maintenance

- name: Fetch detached storage domain for remove
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section (lines 20-26) seems duplicate (same as lines 4-10) + wrong name?
AFAIU lines 20-26 should be removed.

name={{ storage['dr_' + dr_source_map + '_name'] }} and
datacenter={{ storage['dr_' + dr_source_map + '_dc_name'] }} and {{ dr_active_domain_search }}
auth: "{{ ovirt_auth }}"
register: storage_domain_info_active
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storage_domain_info_active
-->
storage_domain_info_detached

ovirt_storage_domain_info:
pattern: >
name={{ storage['dr_' + dr_source_map + '_name'] }} and
datacenter={{ storage['dr_' + dr_source_map + '_dc_name'] }} and {{ dr_active_domain_search }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datacenter={{ storage['dr_' + dr_source_map + '_dc_name'] }} and {{ dr_active_domain_search }}
-->
{{ dr_unattached_domain_search }}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datacenter={{ storage['dr_' + dr_source_map + '_dc_name'] }} and {{ dr_active_domain_search }} --> {{ dr_unattached_domain_search }}

Thanks for the feedback and review!!

Copy link
Member

@barpavel barpavel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last ones are the minuses of duplicate code :)

@barpavel
Copy link
Member

@mnecas @mwperina Looks like CI is stuck with Waiting for status to be reported, do you have any idea why?

shubhaOracle and others added 2 commits January 12, 2023 10:35
…er_domains.yml


move 1 character to the left

Co-authored-by: Pavel Bar <[email protected]>
…er_domains.yml


Remove redundant parts and trailing space

Co-authored-by: Pavel Bar <[email protected]>
shubhaOracle and others added 5 commits January 12, 2023 10:47
…er_domains.yml


move 1 character to the left

Co-authored-by: Pavel Bar <[email protected]>
…er_domains.yml


Remove redundant parts and trailing space

Co-authored-by: Pavel Bar <[email protected]>
@mwperina
Copy link
Member

@mnecas @mwperina Looks like CI is stuck with Waiting for status to be reported, do you have any idea why?

You need to click on Approve and run button each time when non-oVirt organization member updates his PR, just as I did now

@barpavel barpavel self-requested a review January 12, 2023 17:42
Copy link
Member

@barpavel barpavel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shubhaOracle did you perform any validations on your environment?
If not, can you please do and then acknowledge that it's working?

@barpavel
Copy link
Member

barpavel commented Jan 12, 2023

@mnecas @mwperina Looks like CI is stuck with Waiting for status to be reported, do you have any idea why?

You need to click on Approve and run button each time when non-oVirt organization member updates his PR, just as I did now

I probably don't have relevant permissions, since I don't see this magic button :)

@barpavel barpavel self-requested a review January 12, 2023 18:16
@barpavel
Copy link
Member

barpavel commented Jan 12, 2023

@shubhaOracle

  1. Now looking at it again, I feel that I'm either confused now or I was confused when performing previous code review cycles.
    What makes me very suspicious - how did it work before? This flow doesn't seem to be a corner case... And after I checked the history, I see that this code existed from 2018 with only lint or new Ansible versions modifications after that.
    Regarding the conditions: except (maybe) readability - what was changed in the conditions by splitting it into 3 tasks? Isn't the old and the new code logically the same? What scenario didn't work previously and should work now?
    What tests/flows did you perform before (which failed) and after (succeeded)?
  2. I see 14 commits :(
    It should be 1 commit. Maybe 2, if for example in 1 commit you fix the issue with wrong conditions and in another you fix a totally different issue with wrong variable type. That could make sense.
    And this single commit / 2 commits should each have a clear commit message that will allow later see the history of the file(s) and understand the changes.
    A good PR's description is very nice to have, but in git history what's important are the commits' messages.
    And (again) 14 commits on a single file like remove_valid_filtered_master_domains.yml with no commit messages will make future developers' work harder.
    Even if eventually your conditions' change is needed, it should be 1 commit and a good description of the change (what was the problem and what/how is the fix).

@shubhaOracle
Copy link
Contributor Author

LGTM. @shubhaOracle did you perform any validations on your environment? If not, can you please do and then acknowledge that it's working?
Yes, I had performed validations in our environment.

@shubhaOracle
Copy link
Contributor Author

@shubhaOracle

1. Now looking at it again, I feel that I'm either confused now or I was confused when performing previous code review cycles.
   What makes me very suspicious - how did it work before? This flow doesn't seem to be a corner case... And after I checked the history, I see that this code existed from 2018 with only `lint` or new Ansible versions modifications after that.
   Regarding the conditions: except (maybe) readability - what was changed in the conditions by splitting it into 3 tasks? Isn't the old and the new code logically the same? What scenario didn't work previously and should work now?
   What tests/flows did you perform before (which failed) and after (succeeded)?

2. I see 14 commits :(
   It should be 1 commit. Maybe 2, if for example in 1 commit you fix the issue with wrong conditions and in another you fix a totally different issue with wrong variable type. That could make sense.
   And this single commit / 2 commits should each have a clear commit message that will allow later see the history of the file(s) and understand the changes.
   A good PR's description is very nice to have, but in `git` history what's important are the commits' messages.
   And (again) 14 commits on a single file like `remove_valid_filtered_master_domains.yml` with no commit messages will make future developers' work harder.
   Even if eventually your conditions' change is needed, it should be 1 commit and a good description of the change (what was the problem and what/how is the fix).

--->

There are two issues that are fixed with this PR

  1. Failback operation with a Highly Available VM but encountered an Unexpected templating type error occurred on that must be str, not list
  2. Storage domain detection was not finding any storage domains during failback. During the fail-back process, the storage domain didn't get detached.
    The issue is that the query sent to oVirt secondary site to fetch the storage domains doesn't return the storage domains. The query uses multiple and/or conditions to account for various states the storage domain could be in and uses () parenthesis. However, instead of returning an error, it simply returns nothing. I have now broken down this into smaller queries and it is now working.

Regarding commit messages -
The PR was posted long back. So I had to rebase a few times as per errors shown in the PR. Also, I was trying to understand the failures. I agree in principle that appropriate commit messages should accompany the commits and I will keep that in mind for future commits. I am also observing that when I commit using UI based on suggestions, the sign-off and other text is getting overwritten.

@barpavel
Copy link
Member

  1. an Unexpected templating type error occurred on that must be str, not list
  1. an Unexpected templating type error occurred on that must be str, not list
    Actually the opposite, no? It was string and now you changed it to list?
  2. The query ... uses () parenthesis. However, instead of returning an error....
    Still I can't figure out what' wrong with parenthesis that combine multiple sub-conditions. Maybe you experienced a 1-time failure for some unrelated reason? Unless there is some newline issues that I miss, I don't understand what's wrong with the old solution vs. new implementation! Especially that the old solution didn't change from Feb 2020 and seems like it worked, since this is the engine cleanup flow.
    Are you saying that constantly, time after time the old (parenthesis) implementation returns nothing and on the same environment the new solution returns a non-empty result?
    That's very weird :(
  3. The PR was posted long back. So I had to rebase a few times...
    You could have rebased any number of times and address comments iteration after iteration, and just update the same commit with --amend option. Or you can squash multiple commits into 1 now. But 14 commits on the same file without commit messages, that's a very undesired option.
    Anyway, 1st of all need to understand what's wrong (if something is wrong) with the old set of conditions, and then squash the resulting solution into 1 commit (or maybe 2, one for condition, 1 for wrong type fix).

…ncorrect conditional checks.

Failback operation with a Highly Available VM but encountered an Unexpected templating type error.Change attribute to list.

Signed-off-by: Shubha Kulkarni <[email protected]>
…ncorrect conditional checks for detached storage domain.

Failback operation with a Highly Available VM but encountered an Unexpected templating type error.Change attribute to list.

Signed-off-by: Shubha Kulkarni <[email protected]>
…xed conditional checks for detached storage domains. Removed redundant variable quotes to detect Highly Available VM properly

Signed-off-by: Shubha Kulkarni <[email protected]>
@mwperina
Copy link
Member

@shubhaOracle Could you please rebase on top of master branch?

@barpavel
Copy link
Member

@shubhaOracle Hi, can it be closed if not relevant?

@barpavel barpavel closed this Oct 7, 2023
@barpavel
Copy link
Member

barpavel commented Oct 7, 2023

Can be reopened if still relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants