John Alvord, IBM Corporation
jalvord@us.ibm.com
Inspiration
A customer situation was created to detect a dangerously full paging space condition on an AIX system. The formula used the KPX Paging Space attribute and the test was like this (Used Pct > 90).
A situation action command created email sent to operations staff. They monitored such events by reading email on a smart phone. The email did not contain enough information to decide how to handle the issue. The attributes were system wide instead of specific. The smart phone did not have a remote terminal programs to logon to the system.
The customer needed to send environmental information in the body of the email.
Solution
Here is an action command to gather additional information in this case. It only applies to AIX since the svmon command is specific to that Unix version. The general scheme can be used in many more circumstances. The action command uses Linux/Unix/Windows meta characters like ( … ) to create sub-shells. Some of the information comes from the Agent attributes and some from the environment. This command looks like this as a long string. Don’t get scared because a careful explanation follows.
(echo Paging Space on &{KPX_PAGING_SPACE.Node} is at &{KPX_PAGING_SPACE.Used_Pct}%”\n\n”; (echo ” Pid Command Inuse Pin Pgsp Virtual 64-bit Mthrd 16MB”; ps -ef | tail -n +3 | awk ‘{print $2;}’ > /tmp/procs.txt; svmon -Pt20 | grep -f /tmp/procs.txt | sort -r -n -k5)) | mailx -s “Paging Space WARNING Alert” [additional parameters to specify target etc]
To avoid eyestrain I have split the command line out into logical sections:
(
==> Begin sub-shell level 1
echo Paging Space on &{KPX_PAGING_SPACE.Node} is at &{KPX_PAGING_SPACE.Used_Pct}%”\n\n”;
==> First line of email body with an extra blank line
(
==> Begin sub-shell level 2
echo ” Pid Command Inuse Pin Pgsp Virtual 64-bit Mthrd 16MB”;
==> Title line for svmon extract results
ps -ef | tail -n +3 | awk ‘{print $2;}’ > /tmp/procs.txt;
==> Extract all the Process IDs less the title and process id 1
svmon –Pt20 | grep -f /tmp/procs.txt | sort -r -n -k5
==> Get top 20 by virtual storage, select lines with a Process IDs, sort by Pgsp
)
==> End sub-shell level 2
) |
==> End sub-shell level 1 and pipe all standard output to next program,
mailx -s “Paging Space WARNING Alert” [parameters to specify email target etc]
==> Example batch email command. The standard input is the body of the email. The -s is the subject line.
Example of added data
Pid Command Inuse Pin Pgsp Virtual 64-bit Mthrd 16MB
274600 STAFProc 52330 13153 4664 57060 Y Y N
163934 shlap64 56297 13141 3993 60459 Y N N
147590 aixmibd64 50288 13141 3777 54453 Y N N
192632 snmpmibd64 50246 13141 3744 54399 Y N N
172146 hostmibd64 50626 13141 3731 54730 Y N N
1937562 kux_vmstat 50619 13141 3601 54480 Y N N
1810656 kdsmain 87582 13171 3601 75734 Y Y N
1736806 stat_daemon 52404 13141 3601 56310 Y N N
1609822 nfs_stat 50490 13141 3601 54435 Y N N
1245352 kuxagent 56406 13192 3601 60021 Y Y N
1237192 mount_stat 50485 13141 3601 54430 Y N N
1126448 ifstat 50517 13141 3601 54467 Y N N
938208 java 124375 13217 3601 107358 Y Y N
909508 KfwServices 178686 13226 3601 173988 Y Y N
905322 kcawd 51341 13148 3601 55292 Y Y N
847954 java 70769 13167 3601 69446 Y Y N
630926 java 59948 13156 3601 59414 Y Y N
425998 cms 50755 13144 3601 54619 Y Y N
1777814 httpd 27199 13150 2014 28799 N Y N
Summary
The operations staff was now able to evaluate what team was needed to handle the problem condition without having to logon to the system reporting trouble.
This general scheme could be extended to running a shell script which could access local files and databases or almost anything you want inside a sub-shell.
Sitworld: Table of Contents
Note: A single hummingbird.