Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka Connect for z/OS support #1323

Open
wants to merge 13 commits into
base: 7.3.x
Choose a base branch
from
Open

Kafka Connect for z/OS support #1323

wants to merge 13 commits into from

Conversation

stan-is-hate
Copy link
Member

@stan-is-hate stan-is-hate commented Feb 2, 2023

Description

This PR adds support for running kafka connect on z/OS. This is in BETA, as in not everything has been tested and some things don't work.
I'm not really familiar with your branching strategy and where to point this PR, would appreciate any pointers.

I've tested it using IBM zD&T and cp-env.
A sister PR for cp-env is here: https://github.com/confluentinc/cp-test-env-manager/pull/136

Implementation details:

  • z/OS requires a set of environment variables to be set. cp-ansible provided a variable called proxy_env. Even though it was supposed to be only used for specifying proxy details, in fact all playbooks set it as environment for all tasks automatically. I've renamed it env_vars to be more explicit about what the variable does. This is the only breaking change, all the rest of the changes are hidden behind the ansible_os_family check.
  • z/OS doesn't have SystemD or package managers. It also doesn't have gunzip. So what we do is download confluent tgz locally on your laptop, gunzip it there (but not untar) and then pass the tar to z/os, where we untar it.
  • z/OS doesn't support confluent hub, so we can only install local and remote connectors.
  • This is aimed primarily at testing IBM MQ connectors, and these connectors require IBM jars and .so files to be present in their classpath. So I've added a special hook when installing connectors which will recognize ibm mq connectors and copy jars and .so files onto their classpath.
  • Because we're running in virtualized z/OS, I've added a health_check_hostname var, to make sure guest OS hits its guest IP rather than host IP when doing health checks. I'll do more tests without it as well to make sure the issues I was having were not a fluke, but in any case this change is non breaking.

Limitations and unknowns:

  • SSL doesn't work. Even though I have confirmed that the playbook creates keystores with correct certificates, connect still refuses to work correctly. It's better if connect team handles this. We can document it as not supported for now.
  • Confluent CLI doesn't work on z/OS (that's a product limitation), so any functionality involving it won't work either, e.g. secrets protection.
  • I have not tested deploy_connectors.yml because it is not exposed in cp-env yet, but looking at the code it's all done via REST, so should work fine
  • I have tested RBAC with kerberos, and it does work. However, when rbac is enabled, connect does not use kerberos at all, so there's no guarantee kerberos would work. Because we're running in virtualized instances, there are multiple hostnames involved, one of the host OS, another of the guest OS Kerberos relies on hostnames a lot, so there might be issues with that. I'll test it some more.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

  • [x ] New feature (non-breaking change which adds functionality)

  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

  • This change requires a documentation update

  • Any variable changes have been validated to be backwards compatible

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • Any dependent changes have been merged and published in downstream modules

@stan-is-hate stan-is-hate requested a review from a team as a code owner February 2, 2023 23:08
imcdo
imcdo previously approved these changes Feb 6, 2023
@@ -2,7 +2,7 @@
- name: Host Prerequisites
hosts: zookeeper:kafka_broker:schema_registry:kafka_connect:ksql:control_center:kafka_rest:kafka_connect_replicator
gather_facts: false
environment: "{{ proxy_env }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good but breaking change. Can we make an alias for backward compatibility and deprecate proxy_env for couple of releases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed


- name: Copy Files (z/OS)
when: ansible_os_family == "OS/390"
ibm.ibm_zos_core.zos_copy:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this part of ansible core? If not, we need to add this dependency in our documentation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -91,111 +96,114 @@
"Not enough free disk space for package installation. Minimum space required is {{ required_disk_space_mb }}MB, please free up more disk space
to continue installation. To skip host validations, set validate_hosts to false."
when: (item.size_available < required_disk_space_mb|float)
with_items: "{{ ansible_mounts }}"
with_items: "{{ ansible_mounts if ansible_mounts is defined else {} }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have used default keys for such conditions in many places.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

- name: Check Internet Access
# for z/OS we don't download the archive from the internet
# but rather download it locally on your laptop, and then push to z/os node
when: ansible_os_family != "OS/390"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be no check for Z/OS ?

Copy link
Member Author

@stan-is-hate stan-is-hate Apr 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to check internet access, because z/OS won't download anything from the internet.
I think it was done due to z/os python not being able to validate any ssl certificates, even when I setup correct ssl flags, but I might be wrong, it's been a while :)

- validate
- validate_memory_usage
- validate_memory_usage_zookeeper
- name: Validate memory usage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

@@ -62,6 +62,11 @@
tags:
- validate

- name: Symlink bash on z/OS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please use ansible file or some other core module for the this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to, originally, but built in module was failing, complaining /bin/bash didn't exist.

- installation_method == "archive"
- ansible_os_family != "OS/390"

- name: Expand remote Confluent Platform archive (z/OS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good improvement but its diverging from existing structure. I would like to take this up in separate PR and probably implement for all OS

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Existing structure just wouldn't work for z/OS, because it is way too special :D

@@ -4,6 +4,13 @@
when: not common_role_completed|bool
tags: common

- name: Check z/OS Preconditions
fail:
msg: 'z/OS must be installed from archive'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I read it as "On z/OS, Connect service can be installed via archive"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'll change it to "On z/OS Confluent Platform Services MUST be installed via archive"

@stan-is-hate
Copy link
Member Author

Thanks @nsharma-git , I don't have much bandwidth these days, but I'll try to address your comments in the oncoming weeks. For anything where z/os diverges from the way we do things on other platforms it's done so out of necessity, e.g. z/os cannot unpack tar.gz files, it can only work with uncompressed tar or some weird compression algorithm, but not gz or bz2. That's why we download archive locally first, unzip but not untar it, and then move to the remote machine. We don't untar it locally because un-taring it on z/os updates encoding tags on the files correctly, if memory serves

@vr029 vr029 requested a review from a team as a code owner August 21, 2023 16:34
@cla-assistant
Copy link

cla-assistant bot commented Aug 21, 2023

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
0 out of 3 committers have signed the CLA.

❌ Veerabhadra
❌ stan-confluent
❌ pbadani


Veerabhadra seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants