Solaris - Consommation de ressources
Surveillance disques
Surveiller utilisation des I/O disques
# iostat -xn 2 2 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.1 3.1 9.1 23.5 0.0 0.0 0.0 3.0 0 0 c0d0 34.1 0.0 17.1 0.0 0.0 0.6 0.0 16.9 0 3 c0d1 34.4 0.0 17.2 0.0 0.0 0.6 0.0 16.6 0 3 c0d2 [...] 111.7 56.6 1599.6 990.2 0.0 4.7 0.0 27.8 0 97 vdc34 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 vdc35 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 vdc36 [...]
Les colonnes intéressantes sont "%w" et "%b". "%b" indique le pourcentage d'utilisation du disque (ou de la LUN sur SAN). Ici on voit que le disque vdc34 est utilisé à 97%. Sur ce système, où je n'ai affiché ici que quelques LUNs, de nombreuses LUNs sont utilisées à 100% ou presque et nous ressentons de gros ralentissements. L'activité est trop forte et la baie n'est pas assez performante.
La colonne "%w" indique le pourcentage d'attente pour écrire sur le disque, donc concrètement, si "%w" est supérieur à "%b", c'est que la latence n'est pas sur le disque, mais sur le bus système.
Classer les processus par consommation d'I/O disque
Je n'ai pas trouvé de commande claire pour retourner cette info, mais j'ai trouvé un script perl "rusage" sur le net ici http://www.brendangregg.com/Solaris/prusage. Et un pdf de 45 pages traite en détail de ce sujet ici : http://www.brendangregg.com/Solaris/paper_diskubyp1.pdf.
Voici un exemple d'utilisation et juste après le code source, au cas où le lien ne serait plus accessible. Ici on classe les processus par consommation disque en écriture.
# ./prusage -i 2 2 -s pid PID MINF MAJF INBLK OUBLK CHAR-kb COMM 18085 0 1034624 11 2066703 783785 oracle 18072 0 1004673 28 2006007 2922472 oracle 14414 0 1517 316 926171 525708 tictimed 11107 0 6626 13 777992 1193227 lp 15910 0 1527 137 710801 633187 tictimed 13808 0 964 93 707699 633186 tictimed 14415 0 1188 68 695021 633219 tictimed 14484 0 2700 69 694253 633180 tictimed 13647 0 2211 143 687310 633193 tictimed 15098 0 1186 92 674124 633187 tictimed 13812 0 1268 172 673283 633219 tictimed 13898 0 1042 49 673332 633200 tictimed 12612 0 945 79 671873 633160 tictimed
Et le code source :
#!/usr/bin/perl # # prusage - Process usage stats, Solaris. I/O, sys/usr times, context switches. # A supplement to "ps", can be run as any user. # # 01-Jul-2005, ver 1.00 (check for newer vers, http://www.brendangregg.com) # # # USAGE: prusage [-bchinuwxCT] [-p PID] [-s sort] [-t top] [interval] [count] # # prusage # Default. (-ic 1), fit to screen, 1 secs. # prusage -b # Child times report (must be root or owner) # prusage -i # I/O stats (default) # prusage -u # USR/SYS times # prusage -x # Context Switchs # prusage -w # Wide output # prusage -c # Clear the screen (default) # prusage -C # Don't clear the screen # prusage -T # Don't fit to screen (print all lines) # prusage -p pid # Print this PID only # prusage -s sort # Sort on pid,blks,cpu,utime,inblk,vctx,... # prusage -t lines # Print top lines only # eg, # prusage 2 # 2 second samples (first is historical) # prusage 2 5 # 5 x 2 second samples # prusage -xi 2 # I/O and Context switch reports, 2 secs # prusage -biux 10 # multi output, all reports every 10 secs # prusage -C 10 # 10 second samples, no clear screen # prusage -CT 10 # 10 second samples, all lines # prusage -Ct8 10 5 # 5 x 10 second samples, top 8 lines only # prusage -p 11321 # PID 11321 only # prusage -s pid # sort on PID # # FIELDS: # PID Process ID # MINF Minor Page Faults (satisfied from RAM) # MAJF Major Page Faults (satisfied by disk I/O) # INBLK In Blocks (disk I/O reads) # OUBLK Out Blocks (disk I/O writes) # CHAR-kb Character I/O Kbytes # COMM Command name # USR User Time # SYS System Time # CUSR Child User Time # CSYS Child System Time # WAIT Wait for CPU Time # LOCK User waiting on lock time # TRAP System trap time # VCTX Voluntary Context Switches (I/O bound) # ICTX Involuntary Context Switches (CPU bound) # SYSC System calls # # NOTE: Minor faults always report zero on most versions of Solaris. # # REFERENCE: /usr/include/sys/procfs.h # # SEE ALSO: psio # process I/O # prstat -m # USR/SYS times, ... # /usr/ucb/rusage # historical # # COPYRIGHT: Copyright (c) 2004, 2005 Brendan Gregg. # # This program is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License # as published by the Free Software Foundation; either version 2 # of the License, or (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software Foundation, # Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. # # (http://www.gnu.org/copyleft/gpl.html) # # Author: Brendan Gregg [Sydney, Australia] # # 31-Aug-2004 Brendan Gregg Created this. # 12-Mar-2005 " " Processed /proc/*/psinfo as well. # 09-May-2005 " " Processed /proc/*/usage as well. use Getopt::Std; # # --- Default Variables --- # $INTERVAL = 1; # seconds to sample $MAX = 2**32; # max count of samples $NEW = 0; # skip summary output (new data only) $WIDE = 0; # print wide output (don't truncate) $SCHED = 0; # print PID 0 $TOP = 0; # print top many only $FIT = 1; # fit to screen $CLEAR = 1; # clear screen before outputs $STYLE_IO = 1; # default output style, I/O $STYLE_CTX = 0; # output style, Context Switches $STYLE_TIME = 0; # output style, Times $STYLE_CHILD = 0; # output style, Child times $MULTI = 0; # multi reports, multiple styles $TARGET_PID = -1; # target PID, -1 means all $count = 1; # current iteration # # --- Command Line Arguments --- # ### Check usage &Usage() if $ARGV[0] eq "--help"; getopts('bchinuwxp:s:t:CT') || &Usage(); &Usage() if $opt_h; ### Process options $NEW = 1 if $opt_n; $WIDE = 1 if $opt_w; $FIT = 0 if $opt_T; $CLEAR = 0 if $opt_C; $STYLE_IO = 0 if $opt_x || $opt_u || $opt_b; $STYLE_CTX = 1 if $opt_x; $STYLE_TIME = 1 if $opt_u; $STYLE_CHILD = 1 if $opt_b; $STYLE_IO = 1 if $opt_i; $TOP = $opt_t if defined $opt_t; $SORT = $opt_s if defined $opt_s; $TARGET_PID = $opt_p if defined $opt_p; $INTERVAL = shift(@ARGV) || $INTERVAL; $MAX = shift(@ARGV) || $MAX; ### Determine style count $STYLES = $STYLE_IO + $STYLE_CTX + $STYLE_TIME + $STYLE_CHILD; $MULTI = 1 if $STYLES > 1; ### Determine clear seq $CLEARSTR = `clear` if $CLEAR; ### Fit to screen if ($FIT && ! $opt_t) { my ($row,$col) = &getwinsz(); $TOP = int(($row - $STYLES * 2) / $STYLES); } # # --- Main --- # for (;$count <= $MAX; $count++) { ### Get data &GetProcStat(); # fetch and save /proc stats in %PID{$pid} next if $NEW && $count == 1; ### Print data print $CLEARSTR if $CLEAR; &PrintIO($SORT) if $STYLE_IO; &PrintCtx($SORT) if $STYLE_CTX; &PrintTime($SORT) if $STYLE_TIME; &PrintChild($SORT) if $STYLE_CHILD; ### Pause sleep($INTERVAL) unless $count == $MAX; ### Cleanup memory undef %PID; undef %Comm; } # # --- Subroutines --- # # GetProcStat - Gets /proc usage statistics and saves them in %PID. # This can be run multiple times, the first time %PID will be # populated with the summary since boot values. # This reads /proc/*/usage and /proc/*/psinfo. # sub GetProcStat { my $pid; chdir "/proc"; foreach $pid (sort {$a<=>$b} <*>) { next if $pid == $$; next if $pid == 0 && $SCHED == 0; next if $TARGET_PID > -1 && $pid != $TARGET_PID; # # struct prusage # ### Read usage stats open(USAGE,"/proc/$pid/usage") || next; read(USAGE,$usage,256); close USAGE; ### Unpack usage values ($pr_lwpid, $pr_count, $pr_tstamp, $pr_create, $pr_term, $pr_rtime, $pr_utime, $pr_stime, $pr_ttime, $pr_tftime, $pr_dftime, $pr_kftime, $pr_ltime, $pr_slptime, $pr_wtime, $pr_stoptime, $filltime, $pr_minf, $pr_majf, $pr_nswap, $pr_inblk, $pr_oublk, $pr_msnd, $pr_mrcv, $pr_sigs, $pr_vctx, $pr_ictx, $pr_sysc, $pr_ioch, $filler) = unpack("iia8a8a8a8a8a8a8a8a8a8a8a8a8a8a48LLLLLLLLLLLLa40",$usage); ### Process usage values $New{$pid}{utime} = timestruct2int($pr_utime); $New{$pid}{stime} = timestruct2int($pr_stime); $New{$pid}{ttime} = timestruct2int($pr_ttime); $New{$pid}{ltime} = timestruct2int($pr_ltime); $New{$pid}{wtime} = timestruct2int($pr_wtime); $New{$pid}{slptime} = timestruct2int($pr_slptime); $New{$pid}{minf} = $pr_minf; $New{$pid}{majf} = $pr_majf; $New{$pid}{nswap} = $pr_nswap; $New{$pid}{inblk} = $pr_inblk; $New{$pid}{oublk} = $pr_oublk; $New{$pid}{vctx} = $pr_vctx; $New{$pid}{ictx} = $pr_ictx; $New{$pid}{sysc} = $pr_sysc; $New{$pid}{ioch} = $pr_ioch; # and a couple of my own, $New{$pid}{blks} = $pr_inblk + $pr_oublk; $New{$pid}{ctxs} = $pr_vctx + $pr_ictx; $New{$pid}{cpu} = $New{$pid}{utime} + $New{$pid}{stime}; # # struct psinfo # ### Read psinfo stats open(PSINFO,"/proc/$pid/psinfo") || next; read(PSINFO,$psinfo,256); close PSINFO; ### Unpack psinfo values ($pr_flag, $pr_nlwp, $pr_pid, $pr_ppid, $pr_pgid, $pr_sid, $pr_uid, $pr_euid, $pr_gid, $pr_egid, $pr_addr, $pr_size, $pr_rssize, $pr_pad1, $pr_ttydev, $pr_pctcpu, $pr_pctmem, $pr_start, $pr_time, $pr_ctime, $pr_fname, $pr_psargs, $pr_wstat, $pr_argc, $pr_argv, $pr_envp, $pr_dmodel, $pr_taskid, $pr_projid, $pr_nzomb, $filler) = unpack("iiiiiiiiiiIiiiiSSa8a8a8Z16Z80iiIIaa3iiia",$psinfo); ### Save command name $Comm{$pid} = $pr_fname; next unless $STYLE_CHILD; # only child needs the following, # # struct pstatus # ### Read pstatus stats open(PSTATUS,"/proc/$pid/status") || next; read(PSTATUS,$pstatus,128); close PSTATUS; ### Unpack pstatus values ($pr_flags, $pr_nlwp, $pr_pid, $pr_ppid, $pr_pgid, $pr_sid, $pr_aslwpid, $pr_agentid, $pr_sigpend, $pr_brkbase, $pr_brksize, $pr_stkbase, $pr_stksize, $pr_utime, $pr_stime, $pr_cutime, $pr_cstime, $filler) = unpack("iiiiiiiia16iiiia8a8a8a8a",$pstatus); ### Process pstatus values $New{$pid}{cutime} = timestruct2int($pr_cutime); $New{$pid}{cstime} = timestruct2int($pr_cstime); $New{$pid}{ccpu} = $New{$pid}{cutime} + $New{$pid}{cstime}; } ### Cleanup memory foreach $pid (keys %New) { # save PID values, foreach $key (keys %{$New{$pid}}) { $PID{$pid}{$key} = $New{$pid}{$key} - $Old{$pid}{$key}; } } undef %Old; foreach $pid (keys %New) { # save old values, foreach $key (keys %{$New{$pid}}) { $Old{$pid}{$key} = $New{$pid}{$key}; } } } # PrintIO - print a report on I/O statistics: minf, majf, inblk, oublk, ioch. # sub PrintIO { my $sort = shift || "blks"; my $top = $TOP; my $pid; ### Print header printf("%6s %5s %5s %8s %8s %9s %s\n","PID", "MINF","MAJF","INBLK","OUBLK","CHAR-kb","COMM"); ### Print report foreach $pid (&SortPID("$sort")) { printf("%6s %5s %5s %8s %8s %9.0f %s\n",$pid, $PID{$pid}{minf},$PID{$pid}{majf},$PID{$pid}{inblk}, $PID{$pid}{oublk},$PID{$pid}{ioch}/1024, trunc($Comm{$pid},33)); last if --$top == 0; } print "\n" if $MULTI; } # PrintTime - print a report on Times: utime, stime, wtime, ltime, ttime. # sub PrintTime { my $sort = shift || "cpu"; my $top = $TOP; my $pid; ### Print header printf("%6s %8s %8s %8s %6s %6s %s\n","PID", "USR","SYS","WAIT","LOCK","TRAP","COMM"); ### Print report foreach $pid (&SortPID("$sort")) { printf("%6s %8.2f %8.2f %8.2f %6.2f %6.2f %s\n",$pid, $PID{$pid}{utime},$PID{$pid}{stime},$PID{$pid}{wtime}, $PID{$pid}{ltime},$PID{$pid}{ttime},trunc($Comm{$pid},32)); last if --$top == 0; } print "\n" if $MULTI; } # PrintCtx - print a report on Context Swithes: utime, stime, vctx, ictx, sysc. # sub PrintCtx { my $sort = shift || "ctxs"; my $top = $TOP; my $pid; ### Print header printf("%6s %7s %7s %9s %8s %10s %s\n","PID", "USR","SYS","VCTX","ICTX","SYSC","COMM"); ### Print report foreach $pid (&SortPID("$sort")) { printf("%6s %7.2f %7.2f %9s %8s %10s %s\n",$pid, $PID{$pid}{utime},$PID{$pid}{stime},$PID{$pid}{vctx}, $PID{$pid}{ictx},$PID{$pid}{sysc},trunc($Comm{$pid},27)); last if --$top == 0; } print "\n" if $MULTI; } # PrintChild - print a report on Times: utime, stime, wtime, ltime, ttime. # sub PrintChild { my $sort = shift || "ccpu"; my $top = $TOP; my $pid; ### Print header printf("%6s %8s %8s %8s %8s %s\n","PID", "USR","SYS","CUSR","CSYS","COMM"); ### Print report foreach $pid (&SortPID("$sort")) { printf("%6s %8.2f %8.2f %8.2f %8.2f %s\n",$pid, $PID{$pid}{utime},$PID{$pid}{stime},$PID{$pid}{cutime}, $PID{$pid}{cstime},trunc($Comm{$pid},32)); last if --$top == 0; } print "\n" if $MULTI; } # SortPID - sorts the PID hash by the key given as arg1, returning a sorted # array of PIDs. # sub SortPID { my $sort = shift; ### Sort numerically if ($sort eq "pid") { return sort {$a <=> $b} (keys %PID); } else { return sort {$PID{$b}{$sort} <=> $PID{$a}{$sort}} (keys %PID); } } # getwinsz - gets the terminal window size and returns it as x, y. # The default size returned is 24x80 if an error is encountered. # sub getwinsz { my $row = 24; my $col = 80; my ($xpix,$ypix,$winsize); my $TIOCGWINSZ = 21608; # check /usr/include/sys/termios.h open(TTY, "+</dev/tty") || return($row,$col); ioctl(TTY, $TIOCGWINSZ, $winsize=) || return($row,$col); ($row, $col, $xpix, $ypix) = unpack('S4', $winsize); return($row,$col); } # timestruct2int - Convert a timestruct value (64 bits) into an integer # of seconds. # sub timestruct2int { my $timestruct = shift; my ($secs,$nsecs,$time); $secs = $nsecs = $time = 0; ($secs,$nsecs) = unpack("LL",$timestruct); $time = $secs + $nsecs * 10**-9; return $time; } # trunc - Returns a truncated string if required. # sub trunc { my $string = shift; my $length = shift; if ($WIDE) { return $string; } else { return substr($string,0,$length); } } # Usage - print usage message and exit. # sub Usage { print STDERR <<END; prusage ver 0.97 USAGE: prusage [-chinuwx] [-p PID] [-s sort] [-t top] [interval] [count] prusage # Default. (-ic 1), fit to screen, 1 secs. prusage -b # Child times report (must be root or owner) prusage -i # I/O stats (default) prusage -u # USR/SYS times prusage -x # Context Switchs prusage -w # Wide output prusage -c # Clear the screen (default) prusage -C # Don't clear the screen prusage -T # Don't fit to screen (print all lines) prusage -p pid # Print this PID only prusage -s sort # Sort on pid,blks,cpu,utime,inblk,vctx,... prusage -t lines # Print top lines only eg, prusage 2 # 2 second samples (first is historical) prusage 2 5 # 5 x 2 second samples prusage -xi 2 # I/O and Context switch reports, 2 secs prusage -biux 10 # multi output, all reports every 10 secs prusage -C 10 # 10 second samples, no clear screen prusage -CT 10 # 10 second samples, all lines prusage -Ct8 10 5 # 5 x 10 second samples, top 8 lines only prusage -p 11321 # PID 11321 only prusage -s pid # sort on PID END exit; }