Sitworld: Event History #4 Conflict Between DisplayItem and Attributes

ballroom

John Alvord, IBM Corporation

jalvord@us.ibm.com

Draft #1 – 13 April 2018 – Level 1.00000

Follow on twitter

Inspiration

The Event History Audit project is complex to grasp as a whole. The following series of posts will track individual diagnostic efforts and how the new project aided in the process.

A Situation Event Conflict Between DisplayItem and Attributes

This was seen in the Event Audit History Advisory section:

90,EVENTAUDIT1014E,TEMS,Situations [1] had DisplayItem configured which was not in results – See report EVENTREPORT024

This arose during testing and was a surprise.

And in that report section:

EVENTREPORT024: Situations using unknown DisplayItems

Situation,DisplayItem,

ccp_fss_ulzf_suse,KLZDISK.MOUNTPT,

There is a situation ccp_fss_ulzf_suse.  It has a DisplayItem KLZDISK.MOUNTPT that is unknown – in the sense that the table/column is not found in the attributes. As a result the Atomize value is always null in the results. Because of this condition events can be hidden.

Deep dive Into the report details

Scan or search ahead for Report 999. It is sorted by first node, then situation, then by Time at the TEMS. I will first describe what you see and the guidance from the column description line.

EVENTREPORT999: Full report sorted by Node/Situation/Time

Situation,Node,Thrunode,Agent_Time,TEMS_Time,Deltastat,Reeval,Results,Atomize,DisplayItem,LineNumber,PDT

Situation – Situation Name, which can be different from the Full Name that you see in situation editor, like too long or other cases.

Node – Managed System Name or Agent Name

Thrunode – The managed system that knows how to communicate with the agent, the remote TEMS in simple cases

Agent_Time – The time as recorded at the Agent during TEMA processing. You will see cases where the same Agent time is seen in multiple TEMS seconds because the Agent can produce data faster than then TEMS can process it at times. Simple cases have a last three digits of 999. Other cases will have tie breakers of 000,001,…,998 when a lot of data is being generated. This the UTC [earlier GMT] time at the agent.

TEMS_Time – The time as recorded at the TEMS during processing. This the UTC [earlier GMT] time.

Deltastat – event status. You generally see Y for open and N for close. There are more not recorded here.

Reeval – Sampling interval [re-evaluation] in seconds and 0 means a pure event.

Results – How many results were seen. The simplest cases are 1 and you would see that if you used -allresults control. In this report you only get a warning when there are multiple results.

Atomize – The table/column specification of the value used for Atomize. It can be null meaning not used.

DisplayItem – The value of the atomize in this instance. Atomize is just the [up to] first 128 bytes of another string attribute.

LineNumber – A debugging helper that tells what line of the TSITSTSH data dump supplied this information

PDT  – The Predicate or Situation Formula as it is stored.

The Descriptor line – before we see the results.

ccp_fss_ulzf_suse,zec_uspokpchd01:LZ,REMOTE_us22rtm031ccpr1,1180410002908999,1180410002908008,Y,300,2,KLZDISK.MOUNTPT,,6576,*IF ( ( *VALUE Linux_Disk.Space_Used_Percent *GE 95 *AND *VALUE Linux_Disk.Mount_Point_U *IN ( ‘/’,’/usr’,’/var’,’/tmp’,’/home’ ) ) *OR ( *VALUE Linux_Disk.Space_Used_Percent *GE 95 *AND *SCAN Linux_Disk.Mount_Point_U *EQ ‘/ABAPCS/MON’) *OR ( *VALUE Linux_Disk.Space_Used_Percent *GE 95 *AND *SCAN Linux_Disk.Mount_Point_U *EQ ‘/JAVACS/MON’ ) *OR ( *VALUE Linux_Disk.Space_Used_Percent *GE 95 *AND *SCAN Linux_Disk.Mount_Point_U *EQ ‘/RDBMS/MON’ ) *OR ( *VALUE Linux_Disk.Space_Used_Percent *GE 95 *AND *SCAN Linux_Disk.Mount_Point_U *EQ ‘/SAPAS/MON’ ) *OR ( *VALUE Linux_Disk.Space_Used_Percent *GE 95 *AND *SCAN Linux_Disk.Mount_Point_U *EQ ‘/opt’ ) ),

,

,

Following the descriptor line is one or more P [Predicate/formula] lines as used as the Agent logic, followed by the results contributing to the TEMS logic.

,,,,,,,P,*PREDICATE=( ( LNXDISK.PCTSPCUSED >= 95 AND ( LNXDISK.MOUNTPTU = N’/’ OR LNXDISK.MOUNTPTU = N’/usr’ OR LNXDISK.MOUNTPTU = N’/var’ OR LNXDISK.MOUNTPTU = N’/tmp’ OR LNXDISK.MOUNTPTU = N’/home’ ) ) OR ( LNXDISK.PCTSPCUSED >= 95 AND STRSCAN(LNXDISK.MOUNTPTU, N’/ABAPCS/MON’) = 1 ) OR ( LNXDISK.PCTSPCUSED >= 95 AND STRSCAN(LNXDISK.MOUNTPTU, N’/JAVACS/MON’) = 1 ) OR ( LNXDISK.PCTSPCUSED >= 95 AND STRSCAN(LNXDISK.MOUNTPTU, N’/RDBMS/MON’) = 1 ) OR ( LNXDISK.PCTSPCUSED >= 95 AND STRSCAN(LNXDISK.MOUNTPTU, N’/SAPAS/MON’) = 1 ) OR ( LNXDISK.PCTSPCUSED >= 95 AND STRSCAN(LNXDISK.MOUNTPTU, N’/opt’) = 1 ) ),

Following the predicate is one or more result lines. These are all in the form of Attribute=value in the Table/Column=raw_data form. There is a leading count of the index of this result line. In this case there were many P lines and many result lines. More comments follow. Ignore the funny emoticons that some Browsers convert  equal [=] followed by  semicolon [;] into. If needed you can copy/paste the line into a line mode editor for study. Clearly the results were coming in very fast, but apparently they arrived in three separate bundles of 4 total results.

,,,,,,,0,LNXDISK.DSKNAME=/dev/mapper/vgsystem-lv_root;LNXDISK.DSKSIZE=101600;LNXDISK.FSTYPE=ext3;LNXDISK.INODEFREE=6363019;LNXDISK.INODESIZE=6610944;LNXDISK.INODEUSED=247925;LNXDISK.MOUNTPT=/;LNXDISK.MOUNTPTU=/;LNXDISK.ORIGINNODE=zec_uspokpchd01:LZ;LNXDISK.PCTINDAVAL=96;LNXDISK.PCTINDUSED=4;LNXDISK.PCTSPCUSED=97;LNXDISK.SPCAVAIL=3158;LNXDISK.SPCUSED=93282;LNXDISK.TIMESTAMP=1180410001324000,

,,,,,,,1,LNXDISK.DSKNAME=/dev/mapper/vgsystem-lv_root;LNXDISK.DSKSIZE=101600;LNXDISK.FSTYPE=ext3;LNXDISK.INODEFREE=6363014;LNXDISK.INODESIZE=6610944;LNXDISK.INODEUSED=247930;LNXDISK.MOUNTPT=/;LNXDISK.MOUNTPTU=/;LNXDISK.ORIGINNODE=zec_uspokpchd01:LZ;LNXDISK.PCTINDAVAL=96;LNXDISK.PCTINDUSED=4;LNXDISK.PCTSPCUSED=97;LNXDISK.SPCAVAIL=3157;LNXDISK.SPCUSED=93282;LNXDISK.TIMESTAMP=1180410002830000,

What is the problem and How to fix it?

As can be seen the agent used attribute group tablename LNXDISK for all the attributes. However the DisplayItem was KLZDISK.MOUNTPT which does not match anything in the attributes and thus is assigned the null atomize value.

In history, LNX was the attribute group tablename prefix long long ago. However [I think at ITM 6.2 in 2007] this was changed to KLZ as a tablename prefix to avoid conflicts with Unix OS Agent. For compatibility the old names are still recognized and mapped onto each other.  The current situation editor could never produce such a situation today. The only way this could have been generated would be with a situation dump  [tacmd viewsit -s sitname -e sitname.xml] followed by a manual edit to the xml file and then a replace [tacmd createsit -i sitname.xml]. In that circumstance there is no validity checking performed. 

In any case the situation no longer works as expected in this exact case only a single event will be created when two would be expected. This is a monitoring degradation

Summary

Tale #4 of using Event Audit History to understand and correct a  type of Incorrect DisplayItem conditions and thus get more results.

Sitworld: Table of Contents

History and Earlier versions

There are no binary objects associated with this project.

1.000000

initial release

Photo Note: Future Ballroom – Cruise Ship Build 2018

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: