Replies: 8 comments 1 reply
-
I think it could be useful to get the number of log messages matching the following from the salt master log:
Some of the customers are ignoring such messages, but they are the signs that something wrong with certain minions or master is overloaded. |
Beta Was this translation helpful? Give feedback.
-
Check for "Out of Memory" errors, oom-killer and Java Heap out of memory. Check also general minimal system requirements:
|
Beta Was this translation helpful? Give feedback.
-
We could also implement the check for these rules, specially for large scale: https://documentation.suse.com/suma/4.3/en/suse-manager/specialized-guides/large-deployments/tuning.html |
Beta Was this translation helpful? Give feedback.
-
You should take the input from basic-health-check.txt from supportconfig. It gives statistics about the top 10 memory users, CPU and so on. Always my second file in supportconfig to have a look on. Always a good starting point if you look out for issues ;-) |
Beta Was this translation helpful? Give feedback.
-
Look also in the |
Beta Was this translation helpful? Give feedback.
-
maybe the output of "/usr/lib/susemanager/bin/susemanager-connection-check" may give further hints |
Beta Was this translation helpful? Give feedback.
-
salt-key -L | sed '/Denied Keys/,/Unaccepted Keys/!d' |
Beta Was this translation helpful? Give feedback.
-
Parsing spacewalk-debug/rhn-logs/rhn/reposync.logs for errors ? |
Beta Was this translation helpful? Give feedback.
-
In the context of the RFC for the Health Check Tool for Uyuni, we're currently at the stage of developing the
disconnected solution
( see Draft PR).One component of this tool is the
Uyuni-Health-Exporter
. This component reads from a system troubleshooting information tarball created by thesupportconfig
utility and generates metrics that are considered to be useful in the process of investigating an issue. These metrics will be presented to the person doing the issue research (using Grafana) in a way that is helpful so that the process of finding the solution is accelerated.Another component of the Health Check Tool for Uyuni is the
Loki
component. Loki is a log aggregation system optimised for log storage and querying.Loki
usesPromtail
as an agent to scrape, collect, and forward logs from various sources to Loki for processing. In this case, the only source would be the supportconfig generated tarball.I would like to gather opinions regarding what metrics or useful information that should be interesting to collect from the supportconfig tarball. For example, at this moment, we're collecting these metrics for the
Uyuni-Health-Exporter
:What other information (either metrics or log entries) generated by supportconfig would be useful to collect and present to anybody investigating an issue?
About
supportconfig
:The
supportconfig
tool makes it easier for administrators and support teams to resolve issues by providing a comprehensive system report. It is provided by thesupportutils
package and can integrate plug-ins that extend the functionality of the base tool enabling it to collect specific data.Beta Was this translation helpful? Give feedback.
All reactions