Sitworld: ITM Situation Formula with Calculations

Lavender2009

by John Alvord, IBM Corporation

jalvord@us.ibm.com

Follow on twitter

Inspiration

To identify servers with too much workload, you can use the System Load factor. I searched for references to

“Understanding Linux CPU Load – when should you be worried?”

One reference suggested

(System Load ) / (Number of processors) > .70

is worrisome condition and should be investigated. Another authority set that > 1.00 was a problem. Whatever you decide it is an interesting case to get an alert on.

However ITM does not provide a direct way to perform calculations as above or compute with multiple attributes. Situations provide a here and now calculation based on existing attributes.

An always true situation, a simple action command using awk and a helper UNTIL situation can be used to achieve the goal.
Linux/Unix Situations with calculations

We start a situation formula – which will do basic filtering and is always true.

sitcalc1

( Number of Processors Online >= 1)

or advanced formula display

( #’LNXMACHIN.ONLNCPU’ >= 1)

You might think about including a system load average also. However such a formula would use two attributes groups. At the base level agents only work with one attribute group in a situation. To allow flexibilty the TEMS creates mostly invisible sub-situations during situation start. These sub-situations send results to the TEMS every cycle. The results are returned to the TEMS and that is expensive compared to doing most work at the Agent.

This condition will always be true. Later on the presentation of the event will be suppressed using an UNTIL situation when the condition is false.

The action tab looks like this

sitcalc2
There will only ever be one item, since these attributes are single row. The action could be executed at the TEMS if that is a better place to handle the helper situation. Take action each interval is required because this formula will always be true, but we need to perform this logic each cycle. The action command option is set to run each interval.

Here is the action command:

(uptime && echo &{Linux_Machine_Information.Number_of_Processors_Online}) | awk ‘{up=$(NF);getline;np=$1} END{if (up/np > .70) system(“echo highload>$CANDLEHOME/tmp/IBM_highload.marker”); else system(“rm -f $CANDLEHOME/tmp/IBM_highload.marker”)}’
That is 246 characters long and the limit is 506 in the situation action command before substitution.. it does not surpass system limits.

This command works like this

1)(

==> Open subshell environment

2) uptime &&

==> Run the uptime command which happens to have the 15 minute load average as the last word in output. The && means to run another command

3) echo &{Linux_Machine_Information.Number_of_Processors_Online}

==> put the number of processes into the standard output. This is substituted during Agent side logic

4) ) |

==> Close parenthesis ends the subshell environment. Upbar pipes the resulting two lines into the next stage.

5) awk ‘{up=$(NF);getline;np=$1} END{if (up/np > .70) system(“echo highload>$CANDLEHOME/tmp/IBM_highload.marker”); else system(“rm -f $CANDLEHOME/tmp/IBM_highload.marker”)}’

Step 5 uses the awk programming language.

http://en.wikipedia.org/wiki/AWK

to perform the needed calculation. The two lines are read in and the needed data is extracted to variables. If the test is true then a marker file is created and if the test is false, the marker file is deleted. If the command was being performed on a TEMS, or if multiple agents were running on a single server, the name would have to include the agent name to avoid conflicting usage.

When this is running, the marker file is present when the problem [high system load] is present and absent when it is not present.

Note: On Solaris the equivalent command “nawk” [new awk] must be used.

Linux/Unix Helper Situation

The base situation is always true. That event must be suppressed when the workload is light. That is done using a helper Situation as an until clause. Until situations essentially suppress the base situation when true.

Click on the Navigation tree File Information workspace and then right-click Situations and create a new situation IBM_highload_helper in the following form.

sitcalc3

The formula is

( Path == ‘/opt/IBM/ITM/tmp’ AND MISSING(File) == (‘IBM_highload.marker’))

When the marker file is missing, this situation evaluates as true. We will use it to suppress the first situation.

The helper situation will not normally start, but you should start it and test by creating and deleting the marker file manually.

How the Base Situation uses the helper situation

Save the helper situation IBM_highload_helper and edit the IBM_highload situation, click on the Until tab, Click on Another Situation is TRUE and then select the helper situation. It will look like this:

sitcalc4
Also set this base situation with Persist=2 which avoid some timing problems with UNTIL situations result reporting.

After this you will want to stop and start both the base situation and the helper situation. After testing, change the distributions to the agents you want to run this and make them both Run at Startup.

Alternative Concept

Instead of a UNTIL situation, let the IBM_highload_helper be a free running situation independent of the base situation with this formula

( Path == ‘/opt/IBM/ITM/tmp’ AND File == ‘IBM_highload.marker’)

When a high load condition exists, a marker file will be created and the helper situation will create a situation event. If the high load condition abates, the base situation will delete the file and the helper situation will close. In this case you would want to not associate the base situation.

Windows Platform Issues

I haven’t implemented this in a Windows environment. One possibility would be to use the set /a  operation, which can compute arithmetic expressions. Another would be to use powershell, which has many options.

Summary

This technote shows how to create a situation where an alert is based on a mathematical calculation. You could do more complex calculation, perhaps using a database and the same sort of helper situation to control whether an alert is displayed. In that case, the awk command would be in a separate file. You could also use Perl or REXX with the same goal of creating or deleting a marker file.

Sitworld: Table of Contents

Notes: Blooming Lavender – Big Sur Summer 2009

 

One thought on “Sitworld: ITM Situation Formula with Calculations

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: