Sitworld: Policing the Hatfields and the McCoys

Moonset2016

John Alvord, IBM Corporation

jalvord@us.ibm.com

Draft #1 – 2 May 2016 – Level 0.5000

Follow on twitter

Inspiration

One more time I had to explain to a customer that you could not have a situation formula that included more than a single multi-row attribute group. They had a worthy goal: they wanted to test for a missing process – but only if that process was installed on the system being monitored. Process attribute groups are a multi-row and file information attribute groups are multi-row and this is an illegal formula. The Portal Client Situation editor would have foiled them, since after the first multi-row attribute group is selected, only single row attribute groups are offered when adding the next attribute test. However, like many customers, they used a tacmd editsys to update to formula to what they wanted. I have seen this a couple times a year “forever”.

I was involved because that “monster” ITM situation flooded a remote TEMS with results and did not even achieve the desired effect. The Missing Process situation fired even though the software was not installed. In any event, the remote TEMS overload was so severe that the remote TEMS failed after a few hours. The Situation Audit static analysis tool pointed to the issue and the TEMS Audit tool reported on the massive workload caused by the errant situation. The remote TEMS overload would have been an amazing 100 times more severe except that 100+ such situations had a syntax error which prevented them from running. That is all too common when manually creating situation formula. [One review showed 30% of ITM environments having at least one situation with a syntax error.]

On the other had, the need was real and had been available in a previous monitoring solutions. Two multi-row attribute groups are like two feuding clans – like the legendary Hatfields and McCoys. They just don’t get on at all and there is a lot of collateral damage.

Background

ITM situations are represented by SQL. To make this more concrete here is a simple situation formula for an Agent Builder Agent

*IF *VALUE K08_FILESYSTEMMONITOR.Comments *EQ ‘NO PARAM’

Here is the SQL that represents that represents the situation

SELECT ATTRIBUT10, ATTRIBUT13, ATTRIBUTE0, ATTRIBUTE1, ATTRIBUTE2, ATTRIBUTE3, ATTRIBUTE4, ATTRIBUTE5, ATTRIBUTE6, ATTRIBUTE7, ATTRIBUTE8, ATTRIBUTE9, HIGHTHRESH, IFREE, INODES, IUSED, IUSEDPCT, LOWTHRESHO, MBUSED, MEDTHRESHO, MINORTHRE0, MINORTHRES, MONITORING, ORIGINNODE, PATTERN, TAG, TIMESTAMP

FROM K08.K08K08FIL0

WHERE SYSTEM.PARMA(“SITNAME”, “test_to_check_group_linux”, 25) AND

SYSTEM.PARMA(“NUM_VERSION”, “0”, 1) AND

SYSTEM.PARMA(“LSTDATE”, “1160315090525000”, 16) AND

SYSTEM.PARMA(“SITINFO”, “TFWD=N;OV=N;”, 12)

AND K08K08FIL0.ATTRIBUT13 = N’NO PARAM’ ;

It is a fact of ITM life that the SQL for a situations will only have a single table [equivalent to attribute group at this level.] The TEMA or Agent Support library only handles a single table.,

If multiple attribute groups were available, logic would have to be prepared to define a key to connect the two attribute groups something like this

WHERE … K08K08FIL0.ATTRIBUT13 = K09K09MEM0.ATTRIBUT9 ..

However ITM has no place to make that definition and no logic to process it correctly if it was present. This is a clear product limitation no matter which way you look at it.

TEMS does handle the case of a single multi-row attribute group and a single row attribute group. It creates one or more invisible sub-situations and knits the results together. It does not have the logic at the TEMS to manage the two multi-row attribute group case.

There is a Light Over Here!

Given the extreme customer need, I searched for alternatives and found a way forward in the world of Mathematical Logic and Set Theory. A long time ago I was a math wonk in graduate school and still retain some of the training.

The goal is to calculate a useful result

A and B

for two multi-row attributes even though ITM does not support that.

ITM does have this construction

A *UNTIL/*SIT B

which you specify using the UNTIL tab in the Situation Editor. The logic is that if B is true [on the same managed system or Agent as A]  then any situation event for A is closed and any future Situation Result for A is ignored. In set theoretic terms that is

A and (~)B

or A and not B. You can easily validate that by running through some examples on paper.

The first breakthrough idea is that A and B can use different attribute groups in Base and Until situations. Situation B cannot usually have DisplayItem set, but A can use DisplayItem and there is considerable value in that mixture.

A second set theory logic rule can now be employed

B is the same as  (~)(~)B

Most people have heard it explained that a double negative is the same as a positive. That is one example.

Suppose we were looking at integers from 1 to 20. And then suppose that B had the formula that value > 10.

After B the integers in the result set would be 11,12,13,14,15,16,17,18,19,20.

In this case (~)B would be the test that value <= 10, and the results would be 1,2,3,4,5,6,7,8,9,10.

Now the reverse again (~)(~)B would again the the test that value >10 and the results would be 11,12,13,14,15,16,17,18,19,20.

So B and (~)(~)B have exactly the same results.

The original goal was to evaluate

A and B

As seen above this is identical to

A and (~)(~)B

and also from above that is now equivalent to

A  *UNTIL/*SIT (~)B

Finally, (~)B will have the same result sets as a variant of B where the formula is reversed – say B_rev. So the following

A *UNTIL/*SIT B_rev  is identical in function to A and B

You may want to work through some examples before continuing – in order to convince yourself.

Practical example

I titled this blog post thinking of two feuding clans – in reference to how hard it is to get two different multi-row attributes working together, However by building a wall between them [BASE/UNTIL] and just referencing each others presence we can achieve some valuable results.

There is a zip file attached with model Linux OS Agent situations which demonstrate this working HMC_examples. Following is a a presentation of the model situations.

For this example, we may have a shell file installed in a directory /tmp/lpp and the shell file is run with this command “sh /tmp/lpp/testsl.sh”. The goal is to have a situation event that fires if the command is installed but is not running.

Until Situation

First is the Until clause. The formula is against the Linux File Information attribute group and the test is whether the /tmp/lpp path is missing. When it is missing, the situation will be true and that will allow the base situation to be suppressed.

HMC1

Base Situation

Next is the base situation which tests if the expected process is running. It uses the Linux Process attribute. The test is whether the process “sh /tmp/lpp/testsl.sh” is missing.

HMC2

In The Advanced button we see Persistence is set to 2

HMC3

And that DisplayItem is specified Proc_CMD_Line happens to be the internal attribute name for Command Line. This is not strictly needed here, but is vital if more than one process was defined in the *MISSING clause.

HMC4

Finally the Until tab

HMC5

This is the linkage between the base situation and the until situation.

Limitations

DisplayItem cannot be usually set in the *UNTIL situation. APAR IV74758 – delivered in ITM 630 FP6 – can allow Base/Until DisplayItems in limited cases. This requires a TEMS manual configuration and a precise knowledge that the two DisplayItems are in the same internal format.

Persist=2 must be set on Base situation to avoid race conditions between base results and until results.

If the Base situation could return multiple results, DisplayItem must be defined that multiple events can be created.

Summary

How to get two multi-row attribute groups to influence each each other to gain useful information.

Sitworld: Table of Contents

History and Earlier versions

If the current example situation do not work, you can try previous published binary object zip files. At the same time please contact me to resolve the issues.  If you discover an issue try intermediate levels to isolate where the problem was introduced.

HMC_examples

Initial release

Photo Note: Moon-set over the Pacific Ocean 20 April 2016

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: