Skip to content

Commit

Permalink
Add sos_report role
Browse files Browse the repository at this point in the history
Generate SOS report from a list of OCP nodes, supports connected and
disconnected OCP clusters.
A SOS report is helpful to debug issues in a node. It contains information
about the underlying OS, packages, modules and logs. It is in some cases
requested for certain type of bugs.
  • Loading branch information
tonyskapunk committed Jan 2, 2024
1 parent 217e05d commit c2b8ffc
Show file tree
Hide file tree
Showing 5 changed files with 111 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ Name | Description
[redhatci.ocp.setup_minio](https://github.com/redhatci/ansible-collection-redhatci-ocp/blob/main/roles/setup_minio/README.md) | Deployment of [Minio](https://min.io/).
[redhatci.ocp.sno_installer](https://github.com/redhatci/ansible-collection-redhatci-ocp/blob/main/roles/sno_installer/README.md) | Deploy OCP SNO in a very opinionated fashion.
[redhatci.ocp.sno_node_prep](https://github.com/redhatci/ansible-collection-redhatci-ocp/blob/main/roles/sno_node_prep/README.md) | Preparation to deploy OCP SNO
[redhatci.ocp.sos_report](https://github.com/redhatci/ansible-collection-redhatci-ocp/blob/main/roles/sos_report/README.md) | Generate SOS report from a list of OCP nodes.
[redhatci.ocp.storage_tester](https://github.com/redhatci/ansible-collection-redhatci-ocp/blob/main/roles/storage_tester/README.md) | Storage Service tests during cluster upgrade
[redhatci.ocp.upi_installer](https://github.com/redhatci/ansible-collection-redhatci-ocp/blob/main/roles/upi_installer/README.md) | UPI Installer
[redhatci.ocp.vbmc](https://github.com/redhatci/ansible-collection-redhatci-ocp/blob/main/roles/vbmc/README.md) | Stup [Virtual BMC](https://docs.openstack.org/virtualbmc/latest/user/index.html)
Expand Down
56 changes: 56 additions & 0 deletions roles/sos_report/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# SOS Report

Generate SOS report from a list of OCP nodes

## Requirements

In disconnected (air gapped) environments the image to use *must* exist prior the use of this role

## Variables

| Variable | Default | Required | Description |
| -------------------- | ------------------------------------------------------------------ | --------- | ---------------------------------------------------------------------------------- |
| sos_report_nodes | \<undefined\> | Yes | A list of OCP node names to generate their SOS report. |
| sos_report_dir | /tmp | No | Directory to place the sos reports. |
| sos_report_image | registry.redhat.io/rhel9/support-tools | No | Fully Qualified Artifact Reference of the image to use containing the sos command. |
| sos_report_options: | -k crio.all=on -k crio.logs=on -k podman.all=on -k podman.logs=on | No | The sos report options. |

## Example Playbook

- SOS report in a single node

```YAML
- name: SOS Report in a worker node
ansible.builtin.include_role:
name: redhatci.ocp.sos_report
vars:
sos_report_nodes:
- worker-0
```
- SOS report in multiple nodes
```YAML
- name: SOS Report in multiple worker nodes
ansible.builtin.include_role:
name: redhatci.ocp.sos_report
vars:
sos_report_nodes:
- worker-0
- worker-1
- worker-2
```
- SOS report in a disconnected environment with a custom directory
```YAML
- name: SOS Report in multiple worker nodes
ansible.builtin.include_role:
name: redhatci.ocp.sos_report
vars:
sos_report_nodes:
- master-0
- worker-0
sos_report_image: my-registry.example.local/tooling/custom-support-tools
sos_report_dir: "{{ my_log_directory }}"
```
4 changes: 4 additions & 0 deletions roles/sos_report/defaults/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
sos_report_dir: "/tmp"
sos_report_image: "registry.redhat.io/rhel9/support-tools"
sos_report_options: "-k crio.all=on -k crio.logs=on -k podman.all=on -k podman.logs=on"
9 changes: 9 additions & 0 deletions roles/sos_report/tasks/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
- name: Validation for sos report
ansible.builtin.assert:
that:
- sos_report_nodes is defined
- sos_report_nodes | length

- name: Generate SOS reports
ansible.builtin.include_tasks: sos-reports.yml
41 changes: 41 additions & 0 deletions roles/sos_report/tasks/sos-reports.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
- name: Generate SOS report for node {{ node_name }}
vars:
sos_report_registry: "{{ sos_report_image | regex_replace('([^/]+)/.*', '\\1') }}"
sos_report_image_name: "{{ sos_report_image | regex_replace('[^/]+/(.*)', '\\1') }}"
shell: >
oc debug --image={{ sos_report_image }} node/{{ node_name }} --
bash -c 'echo -e REGISTRY='{{ sos_report_registry }}'\\nIMAGE='{{ sos_report_image_name }}'> /host/root/.toolboxrc';
oc debug --image={{ sos_report_image }} node/{{ node_name }} --
chroot /host
toolbox
sos report --batch {{ sos_report_options }}
async: 600
poll: 0
register: report
loop: "{{ sos_report_nodes }}"
loop_control:
loop_var: node_name
changed_when: true

- name: Check SOS report status
async_status:
jid: "{{ result.ansible_job_id }}"
loop: "{{ report.results }}"
loop_control:
loop_var: "result"
register: sos_async_results
until: sos_async_results.finished
retries: 20
delay: 30

- name: Extract SOS report for node {{ async.result.node_name }}
vars:
tarball: "{{ async.stdout | regex_findall('/host/var/tmp/sosreport-.*.tar.xz') | first }}"
shell: >
oc debug --image={{ sos_report_image }} node/{{ async.result.node_name }} --
bash -c 'cat {{ tarball }}' > {{ sos_report_dir }}/{{ tarball | basename }}
loop: "{{ sos_async_results.results }}"
loop_control:
loop_var: "async"
changed_when: true

0 comments on commit c2b8ffc

Please sign in to comment.