UPDATE: Did you knew there’s an official Cacti guide? Find it at Cacti 0.8 Beginner’s Guide. For more info about SNMP please don’t hesitate to take a look at Essential SNMP, Second Edition.
Two free open-source tools are running the show for
network and server-activity monitoring. The oldest and quite popular
among network and system administrators is Nagios. Nagios does not only do monitoring, but also event traps, escalation and notification. The younger challenger is called Cacti. Unlike Nagios, it’s written in a scripting language [PHP] so no compiling is necessary – it just runs out of the box1.
Cacti’s problem is that – at its current version – is missing lots of
real-time features such as monitoring and notification. All these
features are scheduled to be integrated in future versions of the
product, but as with any open-source roadmap nothing is guaranteed,
Anyway, this article is focusing on Cacti integration because it’s what I
am currently using.
Cacti is built upon an open-source graphing tool called MRTG and a communication protocol SNMP. SNMP is not exactly a developer’s cup of tea, being more of a network administrator’s tool2.
However, a monitoring server comes extremely handy in performance
measurement and tuning, especially for complex performance behavior
which can only be benchmarked long-term : such as large caches impact on
a web application, or performance of long-running operations.
But is that specific variable you need to monitor,
available with SNMP out of the box ? There is a strong chance it is.
SNMP being an extensible protocol, lots of organization have recorded
their own MIBs and respective implementations. Basically, a MIB is a group of unique identifiers called OIDs.
An OID is a sequence of numbers separated by dots, for instance
‘.1.3.6.1.4.1.2021.11′; each number has a special meaning in a standard
object tree – this example, the meaning of ‘.1.3.6.1.4.1.2021.11′ is
‘.iso.org.dod.internet.private.enterprises.ucdavis.systemStats’. Even
you can have your own MIB in the
‘.iso.org.dod.internet.private.enterprises’ tree, by applying on this page at IANA.
Most probably you don’t really need your own MIB, no matter how ‘exotic’ your monitoring is, because:
a) it’s already there, in the huge list of existing MIBs and implementations
and
b) you are not bound to the existing official MIBs, in fact you can
create your own MIB as long as you replicate it in the snmp
configuration on all the servers that you want to monitor.
To take a look at existing MIBs, free tools are available on the net, IMHO the best one being MibBrowser.
This multiplatform [Java] MIB browser has a free version which should
be more than enough for our basic task. The screen capture shown here
depicts a “Get Subtree” operation on the ‘.1.3.6.1.4.1.2021.11′ MIB; the
result is a list of single value MIBs, such for instance
‘.1.3.6.1.4.1.2021.11.11.0′ which has the alias ‘ssCpuIdle.0′ and value
97 [meaning that the CPU is 97% idle]. You can see the alias by loading
the corresponding MIB file [select File/Load MIB then choose
'UCD-SNMP-MIB.txt' from the list of predefined MIBs].
From command line, in order to display existing MIB values, you can use snmpwalk:
snmpwalk -Os -c [community_name] -v 1 [hostname] .1.3.6.1.4.1.111111.1
3 and the result is:
.1.3.6.1.4.1.2021.11 OID (.iso.org.dod.internet.private.enterprises.ucdavis.systemStats) snmpwalk -v 1 -c sncq localhost .1.3.6.1.4.1.2021.11 UCD-SNMP-MIB::ssIndex.0 = INTEGER: 1 UCD-SNMP-MIB::ssErrorName.0 = STRING: systemStats UCD-SNMP-MIB::ssSwapIn.0 = INTEGER: 0 UCD-SNMP-MIB::ssSwapOut.0 = INTEGER: 0 UCD-SNMP-MIB::ssIOSent.0 = INTEGER: 4 UCD-SNMP-MIB::ssIOReceive.0 = INTEGER: 2 UCD-SNMP-MIB::ssSysInterrupts.0 = INTEGER: 4 UCD-SNMP-MIB::ssSysContext.0 = INTEGER: 1 UCD-SNMP-MIB::ssCpuUser.0 = INTEGER: 2 UCD-SNMP-MIB::ssCpuSystem.0 = INTEGER: 1 UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 96 UCD-SNMP-MIB::ssCpuRawUser.0 = Counter32: 17096084 UCD-SNMP-MIB::ssCpuRawNice.0 = Counter32: 24079 UCD-SNMP-MIB::ssCpuRawSystem.0 = Counter32: 6778580 UCD-SNMP-MIB::ssCpuRawIdle.0 = Counter32: 599169454 UCD-SNMP-MIB::ssCpuRawKernel.0 = Counter32: 6778580 UCD-SNMP-MIB::ssIORawSent.0 = Counter32: 998257634 UCD-SNMP-MIB::ssIORawReceived.0 = Counter32: 799700984 UCD-SNMP-MIB::ssRawInterrupts.0 = Counter32: 711143737 UCD-SNMP-MIB::ssRawContexts.0 = Counter32: 1163331309 UCD-SNMP-MIB::ssRawSwapIn.0 = Counter32: 23015 UCD-SNMP-MIB::ssRawSwapOut.0 = Counter32: 13730
Each of this values has its own significance, like for instance ‘ssCpuIdle.0′ which announces that the CPU is 96% idle.
In order to retrieve just a single value of the list, use its alias as a parameter to the snmpget command, for instance
In order to retrieve just a single value of the list, use its alias as a parameter to the snmpget command, for instance
snmpget -Os -c [community_name] -v 1 [hostname] UCD-SNMP-MIB::ssCpuIdle.0
Sometimes, you want to monitor something which you do
not seem to find in the list of MIBs. Say, for instance, the performance
of a MySQL database that your’re pounding pretty hard with your webapp4.
The easiest way of doing this is to pass through a script – snmp
implementations can take the result of any script and expose it through
the protocol, line by line.
Supposing you want to keep track of the values obtained with the following script:
#!/bin/sh /usr/bin/mysqladmin -uroot status | /usr/bin/awk '{printf("%fn%dn%dn",$4/ 10,$6/1000,$9)}'
The mysqladmin command and a bit of simple awk magic display the following three values, each on a separate line:
- number of opened connections / 10
- number of queries / 1000
- number of slow queries
It is interesting to not that, while the first value is
instantaneous gauge-like, the following two are incremental, growing and
growing as long as new queries and new slow queries are recorded. Will
keep this in mind for later, when we will track these values.
But for now, let’s see how these three values are
exposed through snmp. The first step is to tell the SNMP daemon that the
script has an associated MIB. This is done in the configuration file,
usually located at /etc/snmp/snmp.d. The following line attaches the script [for example /home/user/myscript.sh] execution to a certain OID:
exec .1.3.6.1.4.1.111111.1 MySQLParameters /home/user/myscript.sh
the ‘.1.3.6.1.4.1.111111.1′ OID is a branch of
‘.1.3.6.1.4.1′ [meaning '.iso.org.dod.internet.private.enterprises']. We
tried to make it look ‘legitimate’ but obviously you can use here any
sequence you want to.
After restarting the daemon, let’s interrogate Mibbrowser for the freshly created OID, see the following image snmpwalk -Os -c [community_name] -v 1 [hostname] .1.3.6.1.4.1.111111.1 ; the result is:
enterprises.111111.1.1.1 = INTEGER: 1 enterprises.111111.1.2.1 = STRING: "MySQLParameters" enterprises.111111.1.3.1 = STRING: "/etc/snmp/mysql_params.sh" enterprises.111111.1.100.1 = INTEGER: 0 enterprises.111111.1.101.1 = STRING: "0.900000" enterprises.111111.1.101.2 = STRING: "18551" enterprises.111111.1.101.3 = STRING: "108" enterprises.111111.1.102.1 = INTEGER: 0 enterprises.111111.1.103.1 = ""
Great ! Now we have the proof that it really works and
our specific values extracted with a custom script are visible through
SNMP. Let’s go back to Cacti and see how we can make some nice charts
out of them5.
Cacti has this nice feature of defining ‘templates’ that
you can reuse afterwards. My strategy is to define a data template for
each one of the 3 parameters I want to chart, using the ‘Duplicate’
function applied to the ‘SNMP – Generic OID Template’.
On the duplicate datasource template, you have to change
the datasource title, name to display in charts, data source type [use
DERIVE for incremental counters and GAUGE for instantaneous values],
specific OID and the snmp community. Do it for the three values.
Using the three new datasource templates, create a chart
template for ‘MySQL Activity’. That’s a bit more complicated, but it
boils down to the following procedure, repeated for each of the 3 data
sources:
- add a data source and associate a graph [I always use AREA for the first graph as a background and LINE3 for the other, but it's just a matter of taste]
- associate labels with current or computed values: CURRENT, AVERAGE, MAX in this example
All the rest is really fine tuning – deciding for better colors,
wether to use autoscale or fixed scale and so on. By now, your graph
template should be ready to use.
Note that for the incremental values ['DERIVE' type data
sources] I’ve used titles such as ‘Thousands queries/5 min’ – the 5
minutes come from the Cacti poller which is set to query for data each 5
minutes. The end result is something like this one :
On this real production chart you’ll see a few
interesting patterns. For instance, at 3 o’clock in the morning, there
is a huge spike in all the charted parameters – indeed, a cron’ed script
was provoking this spike. From time to time, a small burst of slow
queries is recorded – still under investigation. What is interesting
here is that these spikes were previously undetectable on the load
average chart, which look clean and innocuous:
To conclude, SNMP is a valuable resource for server
performance monitoring. Often, investigating specific parameters and
displaying them in tools such as Cacti can bring interesting insights
upon the behavior of servers.
Some SNMP implementations in different programming languages:
- Java: Westhawk’s Java SNMP stack [free w commercial support], AdventNet SNMP API [commercial, with a feature-restricted un-expiring free version], iREASONING SNMP API [commercial implementation], SNMP4J [free and feature-rich implementation - thank you Mathias for the tip]
- PHP: client-only supported by the php-snmp extension, part of the PHP distribution [free]
- Python: PySNMP is a Python SNMP framework, client+agents [free].
- Ruby: client-only implementation Ruby SNMP [free]
1 If you’re running Debian, Cacti comes with apt so it’s a breeze to install and run [apt-get install cacti]
2
a bit out of the scope of this article, SNMP also allows writing values
on remote servers, not only retrieving monitored values.
3
Replace [hostname] with the server hostname and [community_name] with
the SNMP community – default being ‘public’. The SNMP community is a way
of authenticating a client to a SNMP server; although the system can be
used for pretty sophisticated stuff, most of the time the servers have a
read-only passwordless community, visible only in the internal network
for monitoring purposes.
4 In fact, a commercial implementation of SNMP for MySQL does exist.
5 The procedure described here applies to Cacti v0.8.6.c
0 comments
Post a Comment