Sitworld: Put Your Situations on a Diet Using Indexed Attributes

summerflowers

By John Alvord, IBM Corporation

jalvord@us.ibm.com

Follow on twitter

Inspiration

During the course of a long sev1 at a large customer site a situation was identified that was delivering 1000+ rows of data every evaluation cycle. This was happening on a just a few Linux OS Agents and it was a contributor to the problems observed at the remote TEMS. The identification was from a TEMS Audit basic workload capture.

1000+ rows of data was a little unusual. I have seen very large AIX systems which ran up to 30,000 processes but most Linux systems have been 100-200 processes maximum. That needed investigation and remediation.

Overview

The situation formula was not complex:

*IF *SCAN KLZ_Process.Proc_CMD_Line *EQ

*/hosting/configs/apache22/fastep-proda-1/web1* *AND

*COUNT KLZ_Process.Process_Count *NE 5

The goal was to check for certain processes and ensure there were exactly 5 of them. I am a tough critic on Situation Formula and so here are three comments.

Note 1: The above would not cover the case of all 5 processes missing. You would need a second situation like this

*IF *SCAN KLZ_Process.Proc_CMD_Line *EQ

*/hosting/configs/apache22/fastep-proda-1/web1* *AND

*MISSING KLZ_Process. Proc_CMD_Line  *EQ

(‘*/hosting/configs/apache22/fastep-proda-1/web1*’)

People are always surprised that you need to treat the 0 or *MISSING case separately. Now you can be the wise one that is not surprised.

Note 2: For the above test the leading and trailing asterisks are redundant and may cause extra overhead. A trailing asterisk causes an SQL clause looking like this

            WHERE  …. Table.column LIKE

‘*/hosting/configs/apache22/fastep-proda-1/web1’

LIKE prevents Agent side filtering. That makes no difference to this case but could be important in other cases.

Better is this.

*IF *SCAN KLZ_Process.Proc_CMD_Line *EQ

‘/hosting/configs/apache22/fastep-proda-1/web1’

Note 3: Assuming there is no trailing asterisk the following cell functions are implemented as Agent side filters: *VALUE, *STR and *SCAN. *TIME prevents Agent side filtering in every cases regardless of the formula. *TIME is a frequent cause of severe TEMS problems.

Gathering data

I asked the client to get data via the Portal Client.

  1. Drill down to a problem agent
  2. In the Process Details workspace view, right click Properties, click return all rows and click OK.
  3. Right click in table and select export and chose a .txt format.
  4. That will create a file in the Windows Documents.

The data was quite interesting. Here is an extract from the data

Process ID | Process Parent ID | Process Command Name | Command Line 

3          | 2                 | migration/0          | [migration/0]

4          | 2                 | ksoftirqd/0          | [ksoftirqd/0]

5          | 2                 | migration/0          | [migration/0]

6          | 2                 | watchdog/0           | [watchdog/0]  

7          | 2                 | migration/1          | [migration/1]

8          | 2                 | migration/1          | [migration/1]

9          | 2                 | ksoftirqd/1          | [ksoftirqd/1]

10         | 2                 | watchdog/1           | [watchdog/1]  

11         | 2                 | migration/2          | [migration/2]

12         | 2                 | migration/2          | [migration/2]

22         | 2                 | watchdog/4           | [watchdog/4]

3821       | 1                 | rscd                 | bin/rscw                 

3824       | 3821              | rscd                 | bin/rscd                 

3825       | 3821              | rscd                 | bin/rscd                 

3890       | 1                 | hpasmlited           | hpasmlited -f /dev/hpilo 

There were roughly 900 migrate/watchdog/ksoftirq processes and also 118 normal processes.

The first conclusion was that the Linux system was in a terrible state and should be recycled. I have seen the issue before with a Linux System that had 6600 total processes. It could barely process any work at all.

I suggested this situation once a day for all Linux systems

*IF *SCAN KLZ_Process.Proc_CMD_Line *EQ  watchdog *AND

*COUNT KLZ_Process.Process_Count *GE 10

The count of 10 should be adjusted up or down based on advice from the customer Linux system administrators.

Indexed Attributes Secret Sauce

There is an undocumented logic in Agent side filtering involving indexed attributes. If an attribute value is “indexed” then it will perform agent side filtering if the test is simple. Here is what you do to use indexed attributes.

From the above data capture the problem processes all have Parent Process ID of 2. We need to see if that attribute is indexed. The file to review is the ODI [Object Definition Interface] file that the TEPS uses. For Linux OS Agent this is docklz which is in

Windows: <installdir>\cnps

Linux/Unix: <installdir>/<arch>/cq/data

Here is the entry for the Parent Process ID in the  KLZ_Process attribute group.

*ATTR:      Parent_Process_ID

*CAPTION:   Process\Parent\ID

*COLUMN:    PPID

*TYPE:      I,8

….

The item to note is the *COLUMN or PPID in this case.

If you scroll backwards in the file to the *OBJECT line

*OBJECT:    KLZ_Process

*CAPTION:   Linux Process

*TABLE:     KLZPROC

*FILE:      UIRA.KLZPROC

*INDEX:     IRAKEY PID PPID CMD CMDLINE STATE

From the *INDEX line you can see the PPID is indeed indexed.

The test must be simple. That means no wild cards tests. In this case the situation formula was changed to

*IF *VALUE KLZ_Process.Parent_Process_ID *NE 2 *AND

*SCAN KLZ_Process.Proc_CMD_Line *EQ

*/hosting/configs/apache22/fastep-proda-1/web1* *AND

*COUNT KLZ_Process.Process_Count *NE 5

With this formula in place, the workload trace showed the result rows dropped from 1000+ to 118 on those affected Linux systems. This reduced the impact on the remote TEMS.

There were other problems going on at the remote TEMS but the customer was very pleased to identify such a serious issue.

Summary

A problem Linux environment stressed the TEMS. That was relieved and the customer was pleased.

Sitworld: Table of Contents

Note: Big Sur Flowers on a Hot Summer Day

 

One thought on “Sitworld: Put Your Situations on a Diet Using Indexed Attributes

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: