Skip to content

Integrating Ganglia with Nagios

Daniel Pocock edited this page Mar 4, 2014 · 9 revisions

Table of Contents

Overview

There are several ways to integrate Ganglia with alerting systems such as Nagios and Icinga. Here is a list of the projects that you can choose from:

Project Language Packaged
Ganglia Web Nagios script bash/php Part of ganglia-web package (Debian, Ubuntu, Fedora, etc)
check_ganglia_metric Python
ganglia-nagios-bridge Python Debian, Ubuntu
Ganglios Python Debian build scripts available

Detailed Descriptions

Ganglia Web Nagios Script

There is a shell script and php file distributed with the Ganglia Web UI that can respond to checks from Nagios.

More detail can be found in the Web UI's Nagios Integration wiki page

check-ganglia-metric

check-ganglia-metric is a simple Nagios plugin that collects ganglia data from gmetad and transparently stores it a local cache file. This is probably the easiest method of integrating Ganglia data into Nagios for small installations - you give it a few command line arguments and everything just works.

This script doesn't scale as well as some others; as the number of hosts Ganglia knows about grows, the size of the cache file and the time it takes to refresh it can become unmanageable and cause high load on the Nagios server.

ganglia-nagios-bridge

ganglia-nagios-bridge is designed to use the flexibility of Nagios Passive Checks to easily push data from ganglia into nagios. You write a config file that lists all the clusters, hosts, and metrics you're interested in watching (using regular expressions), along with thresholds for WARN and CRIT. You call ganglia-nagios-bridge from cron. It reads the config file and pulls data from gmetad, extracts any metrics mentioned in the config file, and pushes the result of the check in to the appropriate directory for Nagios to consume as a passive check.

Because of the way passive checks work, this process is very fast and easy for Nagios to consume, and can therefore scale to a very large number of checks.

Nagios will write errors to the log file for hosts it doesn't know about so it's important to keep your Nagios configuration up to date with a config for every host that your ganglia-nagios-bridge configuration file is mapping to Nagios passive check results or the log files become harder to read.

Ganglios

Ganglios caches ganglia data locally on the Nagios host in a way that makes it very efficient for Nagios active checks to query it for data. You run a cronjob once every minute that queries all your gmond hosts (using the same config as you would for gmetad) and writes it out in XML files. It then explodes those into one XML file per host so that Nagios checks for individual host metrics never have to read more data than is relevant for that host. The local cache makes Nagios checks very fast, allowing you to scale Nagios to large environments.

Ganglios includes a nagios plugin for checking individual metrics and a python library you can use to write alerts that check a variety of metrics across large numbers of hosts.