Scripts nagios - exemples
Exemples de scripts nagios.
Load average pour tous les Unix
Il existe de nombreux scripts qui surveillent le load average sur unix, mais tous ceux que j'ai trouvé lèvent une alerte sur une valeur donnée, sans lien avec le nombre de cpus. Or le load average n'a de sens que comparé au nombre de cpus. J'ai donc développé ce script qui surveille le load en fonction du nombre de cpus. Les valeurs optionnelles qu'on lui passe sont un pourcentage par rapport au nombre de cpus. Les valeurs par défaut ont été choisies pour s'adapter à peu près partout.
#!/usr/bin/env python """ Nagios plugin to check load average on GNU/linux or Solaris servers. This script checks the load average in comparison to the number of cpus. There is a critical alarm if load 1min >= 100% of cpu number (adjustable with "-c" option). There is a warning alarm if load 1min >= 80% of cpu number (adjustable with "-w" option). If you want to set an alarm to 70%, for example, you have to write "0.7", not "70%". Example: %prog -w 0.8,0.8,0.8 -c 1.2,1.2,1.2 # Warning = 80% of cpu number and critical = 120% %prog -w 2,1.5,1 # Warning = 200% of cpu number for load 1 min, 150% for load 5 min and 100% for load 15 min. Result example: OK - nb cpus: 4; load average: 0.02, 0.03, 0.0; warning: 6.0, 4.8, 4.0; critical: 10.0, 8.0, 6.0|nb_cpus=4;load_1=0.02;load_5=0.03;load_15=0.0 frederic.menard-prestataire@laposte.fr """ import sys, optparse, os from optparse import OptionParser VERSION="2.0 (2015-01-13)" ### Parse commandline options Usage = "Usage: %prog [options]\ \nExample: %prog -w 0.8,0.8,0.8 -c 1.2,1.2,1.2 # Warning = 80% of cpu number and critical = 120%" parser = OptionParser(usage=Usage,version="%prog version " + VERSION) parser.add_option("-w", "--warning", action="store", type="string", dest="warn_threshold", help="Warning threshold in % of cpu number for load 1,5,15 min", default="1,0.8,0.7") parser.add_option("-c", "--critical", action="store", type="string", dest="crit_threshold", help="Critical threshold in % of cpu number for load 1,5,15 min", default="1.2,1,0.8") (options, args) = parser.parse_args() ### Extracting options into variables warning = tuple(options.warn_threshold[0:len(options.warn_threshold)].split(',')) warn_1 = float(warning[0]) warn_5 = float(warning[1]) warn_15 = float(warning[2]) critical = tuple(options.crit_threshold[0:len(options.crit_threshold)].split(',')) crit_1 = float(critical[0]) crit_5 = float(critical[1]) crit_15 = float(critical[2]) ### Variables try: nb_cpus = int(os.sysconf('SC_NPROCESSORS_ONLN')) except (AttributeError, ValueError): pass try: import multiprocessing nb_cpus = multiprocessing.cpu_count() except (ImportError, NotImplementedError): pass warn_1 = warn_1 * nb_cpus warn_5 = warn_5 * nb_cpus warn_15 = warn_15 * nb_cpus crit_1 = crit_1 * nb_cpus crit_5 = crit_5 * nb_cpus crit_15 = crit_15 * nb_cpus load_avg = os.getloadavg() load_1 = load_avg[0] load_5 = load_avg[1] load_15 = load_avg[2] ### Tests if load_1 > crit_1 or load_5 > crit_5 or load_15 > crit_15: status = ("CRITICAL",2) elif load_1 > warn_1 or load_5 > warn_5 or load_15 > warn_15: status = ("WARNING",1) elif load_1 >= 0 and load_5 >= 0 and load_15 >= 0: status = ("OK",0) else: status = ("UNKNOWN",3) ### Display results nb_cpus = str(nb_cpus) w1 = str(warn_1) w5 = str(warn_5) w15 = str(warn_15) c1 = str(crit_1) c5 = str(crit_5) c15 = str(crit_15) l1 = str(load_1) l5 = str(load_5) l15 = str(load_15) STATUS = status[0] +" - nb cpus: "+nb_cpus+"; load average: "+l1+", "+l5+", "+l15+"; warning: "+w1+", "+w5+", "+w15+"; critical: "+c1+", "+c5+", "+c15 PERFDATA = "nb_cpus="+nb_cpus+";load_1="+l1+";load_5="+l5+";load_15="+l15 print STATUS + "|" + PERFDATA sys.exit(status[1])
Consommation cpu, mémoire et nombre de process - Solaris
Le script ci-dessous prend comme argument une chaine de caractères qui doit correspondre à un ou plusieurs process qui tournent. Il va alors retourner le nombre de process correspondants trouvés, ainsi que la somme de leur consommation cpu et mémoire. Fonctionne sous Solaris.
#!/bin/ksh # # Script de surveillance pour nagios # Affiche la consommation memoire et cpu des process passes en parametre # ainsi que le nombre de process trouves ###################### Variables ################################# TEMP_FILE=/tmp/$$-1.tmp MEM=0 CPU=0 PROCESS_NB=0 STATUS=0 ###################### Fonctions ################################# usage() { echo "Usage: $0 process(es)-to-analyse" } ################################################################## # Programme principal # ################################################################## if [ "$#" -ne 1 ] then echo "Wrong parameters number." usage exit 1 fi PROCESS=$1 ps -edf > $TEMP_FILE PROCESS_LIST=$(cat "$TEMP_FILE" |grep "$PROCESS"| grep -v "grep $PROCESS" | grep -v "$0 $1" |awk '{print $2}') if [ "$PROCESS_LIST" == "" ] then echo "Unable to find such running process." TEXT="No process" STATUS=2 else for p in $PROCESS_LIST do INFO=$(prstat -p $p 1 1 |grep $p|awk '{print $3 " " $9}') # mem=$3, cpu=$9 MEM_INFO=$(echo "$INFO" | awk '{print $1}') CPU_INFO=$(echo "$INFO" | awk '{print $2}') MEM="$MEM""+"$(echo "$MEM_INFO" | sed "s/K/\*1024/g" | sed "s/M/\*1024\*1024/g" | sed "s/G/\*1024\*1024\*1024/g") CPU="$CPU""+"$(echo "$CPU_INFO" | sed "s/%//g") PROCESS_NB=$(($PROCESS_NB+1)) done MEM=$(echo "$MEM"|bc) # Memoire utilisee en octets CPU=$(echo "scale=1;$CPU"|bc) # Cpu utilise en % TEXT="Status Ok" fi # On fait le menage rm -f $TEMP_FILE echo "${TEXT}|nb=${PROCESS_NB} mem=${MEM} cpu=${CPU}" exit $STATUS
Exemple :
# ./check_process_ressources ftpd Status Ok|nb=12 mem=50987008 cpu=0