Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mdr upgrade scenario #8699

Open
wants to merge 32 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
0c760fe
MDR upgrade scenario implementation
shylesh Oct 14, 2023
537e29d
MDR upgrade scenario
shylesh Oct 17, 2023
1990c21
address review comments from Petr
shylesh Oct 24, 2023
342bfc0
Update roles markers
shylesh Oct 26, 2023
8ac4eb3
Add functions to choose the right parameter tuples based on roles
shylesh Oct 27, 2023
c95f49f
Add MCO and DR HUB operator upgrades class
shylesh Nov 7, 2023
91e42df
Add ACM upgrade class
shylesh Nov 12, 2023
2022a22
Address review comments
shylesh Nov 13, 2023
080e440
Add validations for MCO, DR and ACM upgrades
shylesh Nov 30, 2023
fb77a93
Add facility for handling acm upgrade versions
shylesh Dec 4, 2023
c2d2357
Add skipif_z_stream markers to DR hub and DR cluster operator upgrade…
shylesh Dec 15, 2023
b5071de
Fix test ordering issues
shylesh Jan 20, 2024
4d3d5a6
Handle CephCluster object in case of ACM cluster context during upgrade
shylesh Feb 1, 2024
f04e7c5
Add dummy cephcluster classes for healthmonitor
shylesh Feb 7, 2024
0553332
Fix minor zstream upgrade issues
shylesh Feb 19, 2024
b5d0a30
Make adjustments for the new order marker
shylesh Feb 22, 2024
3971abd
Minor fixes for test ordering
shylesh Mar 12, 2024
41f2eca
Fix few issues in upgrades
shylesh Apr 30, 2024
aa4df5a
Add validation for dr operators as part of ocs upgrade test in the ca…
shylesh Jun 4, 2024
4f37772
Add --acm-version CLI option
shylesh Jun 25, 2024
3e59c3b
Handle reload of the configs during upgrades
shylesh Jul 31, 2024
bd364f3
Save Preupgrade conf values in PREUPGRADE_CONFIG attr of the Config c…
shylesh Sep 11, 2024
24e558b
Fix subscription naming issue
shylesh Oct 1, 2024
25d4dbb
Rely on CSV list to find pre-upgrade csvs rather than PackageManifest
shylesh Oct 11, 2024
dc866bf
Fix acm issues:
shylesh Oct 18, 2024
222e30b
Sleep for 10 seconds after updating the subscription
shylesh Nov 4, 2024
52d8c73
Collect upgrade version in the init
shylesh Nov 19, 2024
a03d448
Add pre and post upgrade CSV version key check
shylesh Nov 26, 2024
360a18d
implement check_if_upgrade_completed() function for DR class
shylesh Dec 3, 2024
ba2330c
Fix tox issues for black
shylesh Dec 9, 2024
02962bb
Address review comments from Petr
shylesh Dec 13, 2024
d7e5d85
Handle default values for DR params in test function for non-DR cases
shylesh Dec 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions conf/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,7 @@ higher priority).
* `skip_ocp_deployment` - Skip the OCP deployment step or not (Default: false)
* `skip_ocs_deployment` - Skip the OCS deployment step or not (Default: false)
* `ocs_version` - Version of OCS that is being deployed
* `acm_version` - Version of acm to be used for this run (applicable mostly to DR scenarios)
* `vm_template` - VMWare template to use for RHCOS images
* `fio_storageutilization_min_mbps` - Minimal write speed of FIO used in workload_fio_storageutilization
* `TF_LOG_LEVEL` - Terraform log level
Expand Down Expand Up @@ -362,6 +363,9 @@ Upgrade related configuration data.
* `ocp_arch` - Architecture type of the OCP image
* `upgrade_logging_channel` - OCP logging channel to upgrade with
* `upgrade_ui` - Perform upgrade via UI (Not all the versions are supported, please look at the code)
* `upgrade_acm_version` - ACM version to which we have to upgrade
* `upgrade_acm_registry_image` - ACM Image tag from brew which should be used to upgrade
example: <brew_registry_url>/rh-osbs/iib:565330

#### AUTH

Expand Down
4 changes: 4 additions & 0 deletions ocs_ci/framework/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,10 @@ class Config:
COMPONENTS: dict = field(default_factory=dict)
# Used for multicluster only
MULTICLUSTER: dict = field(default_factory=dict)
# Use this variable to store any arbitrary key/values related
# to the upgrade context. Applicable only in the multicluster upgrade
# scenario
PREUPGRADE_CONFIG: dict = field(default_factory=dict)

def __post_init__(self):
self.reset()
Expand Down
4 changes: 4 additions & 0 deletions ocs_ci/framework/conf/default_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -362,3 +362,7 @@ MULTICLUSTER:
acm_cluster: False
primary_cluster: False
active_acm_cluster: False

PREUPGRADE_CONFIG:
AUTH: null
MULTICLUSTER: null
32 changes: 32 additions & 0 deletions ocs_ci/framework/pytest_customization/marks.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@
ORDER_BEFORE_OCP_UPGRADE,
ORDER_BEFORE_UPGRADE,
ORDER_OCP_UPGRADE,
ORDER_MCO_UPGRADE,
ORDER_DR_HUB_UPGRADE,
ORDER_ACM_UPGRADE,
ORDER_OCS_UPGRADE,
ORDER_AFTER_OCP_UPGRADE,
ORDER_AFTER_OCS_UPGRADE,
Expand Down Expand Up @@ -117,12 +120,28 @@
order_pre_ocp_upgrade = pytest.mark.order(ORDER_BEFORE_OCP_UPGRADE)
order_pre_ocs_upgrade = pytest.mark.order(ORDER_BEFORE_OCS_UPGRADE)
order_ocp_upgrade = pytest.mark.order(ORDER_OCP_UPGRADE)
order_mco_upgrade = pytest.mark.order(ORDER_MCO_UPGRADE)
order_dr_hub_upgrade = pytest.mark.order(ORDER_DR_HUB_UPGRADE)
# dr cluster operator order is same as hub operator order except that
# it's applicable only on the managed clusters
order_dr_cluster_operator_upgrade = pytest.mark.order(ORDER_DR_HUB_UPGRADE)
order_acm_upgrade = pytest.mark.order(ORDER_ACM_UPGRADE)
order_ocs_upgrade = pytest.mark.order(ORDER_OCS_UPGRADE)
order_post_upgrade = pytest.mark.order(ORDER_AFTER_UPGRADE)
order_post_ocp_upgrade = pytest.mark.order(ORDER_AFTER_OCP_UPGRADE)
order_post_ocs_upgrade = pytest.mark.order(ORDER_AFTER_OCS_UPGRADE)
ocp_upgrade = compose(order_ocp_upgrade, pytest.mark.ocp_upgrade)
# multicluster orchestrator
mco_upgrade = compose(order_mco_upgrade, pytest.mark.mco_upgrade)
# dr hub operator
dr_hub_upgrade = compose(order_dr_hub_upgrade, pytest.mark.dr_hub_upgrade)
dr_cluster_operator_upgrade = compose(
order_dr_cluster_operator_upgrade, pytest.mark.dr_cluster_operator_upgrade
)
# acm operator
acm_upgrade = compose(order_acm_upgrade, pytest.mark.acm_upgrade)
ocs_upgrade = compose(order_ocs_upgrade, pytest.mark.ocs_upgrade)
# pre_*_upgrade markers
pre_upgrade = compose(order_pre_upgrade, pytest.mark.pre_upgrade)
pre_ocp_upgrade = compose(
order_pre_ocp_upgrade,
Expand All @@ -132,12 +151,16 @@
order_pre_ocs_upgrade,
pytest.mark.pre_ocs_upgrade,
)
# post_*_upgrade markers
post_upgrade = compose(order_post_upgrade, pytest.mark.post_upgrade)
post_ocp_upgrade = compose(order_post_ocp_upgrade, pytest.mark.post_ocp_upgrade)
post_ocs_upgrade = compose(order_post_ocs_upgrade, pytest.mark.post_ocs_upgrade)

upgrade_marks = [
ocp_upgrade,
mco_upgrade,
dr_hub_upgrade,
acm_upgrade,
ocs_upgrade,
pre_upgrade,
pre_ocp_upgrade,
Expand Down Expand Up @@ -685,3 +708,12 @@ def get_current_test_marks():
config.DEPLOYMENT.get("kms_deployment") is True,
reason="This test is not supported for KMS deployment.",
)

# Mark the test with marker below to allow re-tries in ceph health fixture
# for known issues when waiting in re-balance and flip flop from health OK
# to 1-2 PGs waiting to be Clean
ceph_health_retry = pytest.mark.ceph_health_retry

# Mark for Multicluster upgrade scenarios
config_index = pytest.mark.config_index
multicluster_roles = pytest.mark.multicluster_roles
21 changes: 21 additions & 0 deletions ocs_ci/framework/pytest_customization/ocscilib.py
Original file line number Diff line number Diff line change
Expand Up @@ -263,6 +263,16 @@ def pytest_addoption(parser):
"(e.g. quay.io/rhceph-dev/ocs-olm-operator:latest-4.3)"
),
)
parser.addoption(
"--acm-version",
dest="acm_version",
help="acm version(e.g. 2.8) to be used for the current run",
)
parser.addoption(
"--upgrade-acm-version",
dest="upgrade_acm_version",
help="acm version to upgrade(e.g. 2.8), use only with DR upgrade scenario",
)
parser.addoption(
"--flexy-env-file", dest="flexy_env_file", help="Path to flexy environment file"
)
Expand Down Expand Up @@ -567,6 +577,11 @@ def process_cluster_cli_params(config):
upgrade_ocs_version = get_cli_param(config, "upgrade_ocs_version")
if upgrade_ocs_version:
ocsci_config.UPGRADE["upgrade_ocs_version"] = upgrade_ocs_version
# Storing previous version explicitly
# Useful in DR upgrade scenarios
ocsci_config.UPGRADE["pre_upgrade_ocs_version"] = ocsci_config.ENV_DATA[
"ocs_version"
]
ocs_registry_image = get_cli_param(config, f"ocs_registry_image{suffix}")
if ocs_registry_image:
ocsci_config.DEPLOYMENT["ocs_registry_image"] = ocs_registry_image
Expand Down Expand Up @@ -661,6 +676,12 @@ def process_cluster_cli_params(config):
if custom_kubeconfig_location:
os.environ["KUBECONFIG"] = custom_kubeconfig_location
ocsci_config.RUN["kubeconfig"] = custom_kubeconfig_location
acm_version = get_cli_param(config, "--acm-version")
if acm_version:
ocsci_config.ENV_DATA["acm_version"] = acm_version
upgrade_acm_version = get_cli_param(config, "--upgrade-acm-version")
if upgrade_acm_version:
ocsci_config.UPGRADE["upgrade_acm_version"] = upgrade_acm_version


def pytest_collection_modifyitems(session, config, items):
Expand Down
152 changes: 152 additions & 0 deletions ocs_ci/ocs/acm_upgrade.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
"""
ACM operator upgrade classes and utilities

"""

import logging
import tempfile
from pkg_resources import parse_version

import requests

from ocs_ci.ocs import constants
from ocs_ci.framework import config
from ocs_ci.ocs.ocp import OCP
from ocs_ci.utility import templating
from ocs_ci.utility.utils import get_ocp_version, get_running_acm_version, run_cmd


logger = logging.getLogger(__name__)


class ACMUpgrade(object):
def __init__(self):
self.namespace = constants.ACM_HUB_NAMESPACE
self.operator_name = constants.ACM_HUB_OPERATOR_NAME
# Since ACM upgrade happens followed by OCP upgrade in the sequence
# the config would have loaded upgrade parameters rather than pre-upgrade params
# Hence we can't rely on ENV_DATA['acm_version'] for the pre-upgrade version
# we need to dynamically find it
self.version_before_upgrade = self.get_acm_version_before_upgrade()
self.upgrade_version = config.UPGRADE["upgrade_acm_version"]
# In case if we are using registry image
self.acm_registry_image = config.UPGRADE.get("upgrade_acm_registry_image", "")
petr-balogh marked this conversation as resolved.
Show resolved Hide resolved
self.zstream_upgrade = False

def get_acm_version_before_upgrade(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity, what is the reason to not use directly get_running_acm_version and instead use this wrapper?

I'm not sure, if there might be a situation, where we would like to call this method on other stages of the workflow, but if it will be called after the upgrade, it will return the actual upgraded version, not the version before upgrade as suggested by the name. Or did I miss anything there?

running_acm_version = get_running_acm_version()
return running_acm_version

def get_parsed_versions(self):
parsed_version_before_upgrade = parse_version(self.version_before_upgrade)
parsed_upgrade_version = parse_version(self.upgrade_version)

return parsed_version_before_upgrade, parsed_upgrade_version

def run_upgrade(self):
self.version_change = (
self.get_parsed_versions()[1] > self.get_parsed_versions()[0]
)
if not self.version_change:
self.zstream_upgrade = True
# either this would be GA to Unreleased upgrade of same version OR
# GA to unreleased upgrade to higher version
if self.acm_registry_image and self.version_change:
self.upgrade_with_registry()
self.annotate_mch()
run_cmd(f"oc create -f {constants.ACM_BREW_ICSP_YAML}")
self.patch_channel()
else:
# GA to GA
self.upgrade_without_registry()
self.validate_upgrade()

def upgrade_without_registry(self):
self.patch_channel()

def patch_channel(self):
"""
GA to GA acm upgrade

"""
patch = f'\'{{"spec": {{"channel": "release-{self.upgrade_version}"}}}}\''
self.acm_patch_subscription(patch)

def upgrade_with_registry(self):
"""
There are 2 scenarios with registry
1. GA to unreleased same version (ex: 2.8.1 GA to 2.8.2 Unreleased)
2. GA to unreleased higher version (ex: 2.8.9 GA to 2.9.1 Unreleased)

"""
if self.acm_registry_image and (not self.version_change):
# This is GA to unreleased: same version
self.create_catalog_source()
else:
# This is GA to unreleased version: upgrade to next version
self.create_catalog_source()
patch = f'\'{{"spec":{{"source": "{constants.ACM_CATSRC_NAME}"}}}}\''
self.acm_patch_subscription(patch)

def annotate_mch(self):
annotation = f'\'{{"source": "{constants.ACM_CATSRC_NAME}"}}\''
annotate_cmd = (
f"oc -n {constants.ACM_HUB_NAMESPACE} annotate mch multiclusterhub "
f"installer.open-cluster-management.io/mce-subscription-spec={annotation}"
)
run_cmd(annotate_cmd)

def acm_patch_subscription(self, patch):
patch_cmd = (
f"oc -n {constants.ACM_HUB_NAMESPACE} patch sub advanced-cluster-management "
f"-p {patch} --type merge"
)
run_cmd(patch_cmd)

def create_catalog_source(self):
logger.info("Creating ACM catalog source")
acm_catsrc = templating.load_yaml(constants.ACM_CATSRC)
if self.acm_registry_image:
acm_catsrc["spec"]["image"] = self.acm_registry_image
else:
# Update catalog source
resp = requests.get(constants.ACM_BREW_BUILD_URL, verify=False)
raw_msg = resp.json()["raw_messages"]
# TODO: Find way to get ocp version before upgrade
version_tag = raw_msg[0]["msg"]["pipeline"]["index_image"][
f"v{get_ocp_version()}"
].split(":")[1]
acm_catsrc["spec"]["image"] = ":".jon(
[constants.ACM_BREW_REPO, version_tag]
)
acm_catsrc["metadata"]["name"] = constants.ACM_CATSRC_NAME
acm_catsrc["spec"]["publisher"] = "grpc"
acm_data_yaml = tempfile.NamedTemporaryFile(
mode="w+", prefix="acm_catsrc", delete=False
)
templating.dump_data_to_temp_yaml(acm_catsrc, acm_data_yaml.name)
run_cmd(f"oc create -f {acm_data_yaml.name}", timeout=300)

def validate_upgrade(self):
acm_sub = OCP(
namespace=self.namespace,
resource_name=self.operator_name,
kind="Subscription.operators.coreos.com",
)
if not self.zstream_upgrade:
acm_prev_channel = f"release-{self.upgrade_version}"
else:
acm_prev_channel = config.ENV_DATA["acm_hub_channel"]
assert acm_sub.get().get("spec").get("channel") == acm_prev_channel
logger.info("Checking ACM status")
acm_mch = OCP(
kind=constants.ACM_MULTICLUSTER_HUB,
namespace=constants.ACM_HUB_NAMESPACE,
)
acm_mch.wait_for_resource(
condition=constants.STATUS_RUNNING,
resource_name=constants.ACM_MULTICLUSTER_RESOURCE,
column="STATUS",
timeout=720,
sleep=5,
)
35 changes: 34 additions & 1 deletion ocs_ci/ocs/cluster.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,19 @@
logger = logging.getLogger(__name__)


class CephClusterMultiCluster(object):
"""
TODO: Implement this class later
This class will be used in case of multicluster scenario
and current cluster is ACM hence this cluster should point to
the ODF which is not in current context

"""

def __init__(self, cluster_conf=None):
pass


class CephCluster(object):
"""
Handles all cluster related operations from ceph perspective
Expand All @@ -84,11 +97,17 @@ class CephCluster(object):
namespace (str): openshift Namespace where this cluster lives
"""

def __init__(self):
def __init__(self, cluster_config=None):
"""
Cluster object initializer, this object needs to be initialized
after cluster deployment. However its harmless to do anywhere.
"""
if cluster_config:
logger.info(
"CephClusterMulticluster will be used to handle multicluster case"
)
return CephClusterMultiCluster()

if config.ENV_DATA["mcg_only_deployment"] or (
config.ENV_DATA.get("platform") == constants.FUSIONAAS_PLATFORM
and config.ENV_DATA["cluster_type"].lower() == "consumer"
Expand Down Expand Up @@ -1035,6 +1054,18 @@ def delete_blockpool(self, pool_name):
self.RBD.exec_oc_cmd(f"patch {patch}")


class MulticlusterCephHealthMonitor(object):
# TODO: This will be a placeholder for now
def __init__(self, ceph_cluster=None):
pass

def __enter__(self):
pass

def __exit__(self, exception_type, value, traceback):
pass


class CephHealthMonitor(threading.Thread):
"""
Context manager class for monitoring ceph health status of CephCluster.
Expand All @@ -1052,6 +1083,8 @@ def __init__(self, ceph_cluster, sleep=5):
sleep (int): Number of seconds to sleep between health checks.

"""
if isinstance(ceph_cluster, CephClusterMultiCluster):
return MulticlusterCephHealthMonitor()
self.ceph_cluster = ceph_cluster
self.sleep = sleep
self.health_error_status = None
Expand Down
Loading
Loading