daily system administration

Linux, Debian and the rest
any questions or comments: Tom@d7031.de

Monitoring Cisco Nexus 7000 switches with Icinga/Nagios

Cisco uses an different MIB to make information over SNMP for there NEXUS hardware available.tcomm.es has provided some plugins to monitor such switches espacially for the FRU-hardware. The original website is not available at the moment, so I’ve made a copy at github.

All configuration examples are tested with a Nexus7000 C7010 and software version 5.1(2) and6.2(8). The following chapters shows some configurations:

checking the power supplies

I use the check_cisco_fru_ps-Plugin.

First you’ve to call the switch in test mode to get the necessary Power IDs:

./check_cisco_fru_ps.pl -H <IP or hostname> -C <community> -E <snmp-version>

Lines with KW inside shows the used power supplies.

CISCO_FRU_PS OK - TEST MODE

CISCO POWERS DATA
Power id = 471                   Power status = 2 (On)           Power description = N7K-AC-6.0KW
...
Power id = 470                   Power status = 2 (On)           Power description = N7K-AC-6.0KW
...

Now you run this command in check-mode:

./check_cisco_fru_ps.pl -H <IP or hostname> -C <community> -E <snmp-version> -e 470,471 -w 9,12 -c 5,8

The values for warning and critical are fixed and explained in the online help_./check_cisco_frups.pl -h, but for critical checks the value 5 should be also applied to match the_PSFAIL error. Last step is define a command and a service check. This service check has no perfdata output, just set process_perf_data to __.

define command {
        command_line       $USER1$/check_cisco_fru_ps.pl -H '$HOSTADDRESS$' -C '$ARG1$' -E 2c -e '$ARG2$','$ARG3$' -w 9,12 -c 8
        command_name    check_snmp_cisco_fru_ps
}

define service {
        use                           generic-service
        service_description  Cisco Nexus7010 Power supply status
        host_name               nexus7010
        process_perf_data   0
        check_command      check_snmp_cisco_fru_ps!<community>!470!471
}

checking the fans

Here is the next plugin I use check_cisco_fru_fan-.

Once again the first run is to call the switch in test mode to get the necessary Fan IDs:

./check_cisco_fru_fan.pl -H <IP or hostname> -C <community> -E <snmp-version> 

You’ll get the lines with information about the fans and id’s.

CISCO_FRU_PS OK - TEST MODE

CISCO POWERS DATA
CISCO_FRU_FAN OK - TEST MODE
CISCO FANS DATA
Fan id = 537                     Fan status = 2 (Up)             Fan description = Fan Module-4
Fan id = 535                     Fan status = 2 (Up)             Fan description = Fan Module-2
Fan id = 534                     Fan status = 2 (Up)             Fan description = Fan Module-1
Fan id = 536                     Fan status = 2 (Up)             Fan description = Fan Module-3

Now you should run this command in check-mode:

./check_cisco_fru_fan.pl -H <IP or hostname> -C <community> -E <snmp-version> -e 534,535,536,537 -w 4 -c 3

The values for warning and critical are fixed and also documented in the online help_./check_cisco_frufan.pl -h. Last step is define a command and a service check. This service check has no perfdata output, just set process_perf_data to __.

define command {
        command_name    check_snmp_cisco_fru_fan
        command_line       $USER1$/check_cisco_fru_fan.pl -H '$HOSTADDRESS$' -C '$ARG1$' -E 2c -e '$ARG2$' -w 4 -c 3
}

define service {
        use                           generic-service
        service_description  Cisco Nexus7010 Fan status
        host_name               nexus7010
        process_perf_data   0
        check_command      check_snmp_cisco_fru_fan!<community>!534,535,536,537
} 

checking the moduls

The last plugin I use for hardware monitoring is check_cisco_fru_module-.

The same as before run it first to call the switch in test mode to get the necessary Module IDs:

./check_cisco_fru_module.pl -H <IP or hostname> -C <community> -E <snmp-version> 

You’ll get the lines with information about the modules and id’s.

CISCO_FRU_MODULE OK - TEST MODE

CISCO MODULES DATA
Module id = 33                   Module status = 2 (Ok)                  Module description = Fabric card module
Module id = 34                   Module status = 2 (Ok)                  Module description = Fabric card module
Module id = 22                   Module status = 2 (Ok)                  Module description = 10/100/1000 Mbps Ethernet Module
Module id = 32                   Module status = 2 (Ok)                  Module description = Fabric card module
Module id = 27                   Module status = 2 (Ok)                  Module description = Supervisor module-1X
Module id = 31                   Module status = 2 (Ok)                  Module description = 10 Gbps Ethernet Module
Module id = 26                   Module status = 2 (Ok)                  Module description = Supervisor module-1X

Now you should run this command in check-mode:

./check_cisco_fru_fan.pl -H <IP or hostname> -C <community> -E <snmp-version> -e 22,26,27,31,32,33,34 -w 4,5,13,14,19,23 -c 7,8,20

The values for warning and critical are fixed and also documented in the online help_./check_cisco_frumodule.pl -h. Last step is define a command and a service check. This service check has no perfdata output, just set process_perf_data to __.

define command {
        command_name    check_snmp_cisco_fru_module
        command_line       $USER1$/check_cisco_fru_module.pl -H '$HOSTADDRESS$' -C '$ARG1$' -E 2c -e '$ARG2$' -w 4,5,13,14,19,23 -c 7,8,20
        }

define service {
        use                           generic-service
        service_description  Cisco Nexus7010 Module status
        host_name               nexus7010
        process_perf_data   0
        check_command      check_snmp_cisco_fru_module!<community>!22,26,27,31,32,33,34
}

More check commands will follow soon.

Tom