Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sled-agent] WIP pargs/pstack of oxide processes #7117

Closed
wants to merge 1 commit into from

Conversation

papertigers
Copy link
Contributor

No description provided.

@leftwo
Copy link
Contributor

leftwo commented Nov 21, 2024

I know this is still a WIP, and I appreciate all the work you are doing here, but I want to suggest we move this work to a separate stand alone service instead of making it part of sled-agent.

For details, see https://rfd.shared.oxide.computer/rfd/0495, but here is my summary:

All the checks and work here require that sled-agent is running and not what is broken. For support-bundles themselves, this makes sense and without a running nexus, none of that framework can operate. My concern is that we also need to support the situation where:

  1. Future us, where we don't have ssh access to sleds, or it becomes more difficult to do so.
  2. Sled agent itself is what is broken.

If we take the good work here, and instead of putting it inside sled-agent, we put it in a stand alone health check service, we get the following:

  • sled-agent becomes a client who makes requests and still gets all the benefits of the service.
  • Another client tool, omdb, or something similar, can be run from the switch zone and also gather debugging data and not require sled-agent itself to be running.
  • If we are in a pre-rack-setup situation, we could still have a health check service that could be used for triage but not require sled-agent to be online.

My concern is that there is a bunch of code that we may end up wanting to move to another place, and if we wait too long, it becomes more entrenched and could be difficult to dislodge.

If we decide that we should keep this in sled-agent, then we need to update RFD 495 with the determination that we are not going to build a stand alone service and why we made that choice.

@papertigers
Copy link
Contributor Author

Superseded by #7194

Taking @leftwo advice above we decided to put this stuff in a standalone sled-diagnostics crate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants