John Alvord, IBM Corporation
jalvord@us.ibm.com
Draft #1 – 13 April 2018 – Level 1.00000
Inspiration
The Event History Audit project is complex to grasp as a whole. The following series of posts will track individual diagnostic efforts and how the new project aided in the process.
A Situation Event Conflict Between DisplayItem and Attributes
This was seen in the Event Audit History Advisory section:
90,EVENTAUDIT1014E,TEMS,Situations [1] had DisplayItem configured which was not in results – See report EVENTREPORT024
This arose during testing and was a surprise.
And in that report section:
EVENTREPORT024: Situations using unknown DisplayItems
Situation,DisplayItem,
ccp_fss_ulzf_suse,KLZDISK.MOUNTPT,
There is a situation ccp_fss_ulzf_suse. It has a DisplayItem KLZDISK.MOUNTPT that is unknown – in the sense that the table/column is not found in the attributes. As a result the Atomize value is always null in the results. Because of this condition events can be hidden.
Deep dive Into the report details
Scan or search ahead for Report 999. It is sorted by first node, then situation, then by Time at the TEMS. I will first describe what you see and the guidance from the column description line.
EVENTREPORT999: Full report sorted by Node/Situation/Time
Situation,Node,Thrunode,Agent_Time,TEMS_Time,Deltastat,Reeval,Results,Atomize,DisplayItem,LineNumber,PDT
Situation – Situation Name, which can be different from the Full Name that you see in situation editor, like too long or other cases.
Node – Managed System Name or Agent Name
Thrunode – The managed system that knows how to communicate with the agent, the remote TEMS in simple cases
Agent_Time – The time as recorded at the Agent during TEMA processing. You will see cases where the same Agent time is seen in multiple TEMS seconds because the Agent can produce data faster than then TEMS can process it at times. Simple cases have a last three digits of 999. Other cases will have tie breakers of 000,001,…,998 when a lot of data is being generated. This the UTC [earlier GMT] time at the agent.
TEMS_Time – The time as recorded at the TEMS during processing. This the UTC [earlier GMT] time.
Deltastat – event status. You generally see Y for open and N for close. There are more not recorded here.
Reeval – Sampling interval [re-evaluation] in seconds and 0 means a pure event.
Results – How many results were seen. The simplest cases are 1 and you would see that if you used -allresults control. In this report you only get a warning when there are multiple results.
Atomize – The table/column specification of the value used for Atomize. It can be null meaning not used.
DisplayItem – The value of the atomize in this instance. Atomize is just the [up to] first 128 bytes of another string attribute.
LineNumber – A debugging helper that tells what line of the TSITSTSH data dump supplied this information
PDT – The Predicate or Situation Formula as it is stored.
The Descriptor line – before we see the results.
ccp_fss_ulzf_suse,zec_uspokpchd01:LZ,REMOTE_us22rtm031ccpr1,1180410002908999,1180410002908008,Y,300,2,KLZDISK.MOUNTPT,,6576,*IF ( ( *VALUE Linux_Disk.Space_Used_Percent *GE 95 *AND *VALUE Linux_Disk.Mount_Point_U *IN ( ‘/’,’/usr’,’/var’,’/tmp’,’/home’ ) ) *OR ( *VALUE Linux_Disk.Space_Used_Percent *GE 95 *AND *SCAN Linux_Disk.Mount_Point_U *EQ ‘/ABAPCS/MON’) *OR ( *VALUE Linux_Disk.Space_Used_Percent *GE 95 *AND *SCAN Linux_Disk.Mount_Point_U *EQ ‘/JAVACS/MON’ ) *OR ( *VALUE Linux_Disk.Space_Used_Percent *GE 95 *AND *SCAN Linux_Disk.Mount_Point_U *EQ ‘/RDBMS/MON’ ) *OR ( *VALUE Linux_Disk.Space_Used_Percent *GE 95 *AND *SCAN Linux_Disk.Mount_Point_U *EQ ‘/SAPAS/MON’ ) *OR ( *VALUE Linux_Disk.Space_Used_Percent *GE 95 *AND *SCAN Linux_Disk.Mount_Point_U *EQ ‘/opt’ ) ),
,
,
Following the descriptor line is one or more P [Predicate/formula] lines as used as the Agent logic, followed by the results contributing to the TEMS logic.
,,,,,,,P,*PREDICATE=( ( LNXDISK.PCTSPCUSED >= 95 AND ( LNXDISK.MOUNTPTU = N’/’ OR LNXDISK.MOUNTPTU = N’/usr’ OR LNXDISK.MOUNTPTU = N’/var’ OR LNXDISK.MOUNTPTU = N’/tmp’ OR LNXDISK.MOUNTPTU = N’/home’ ) ) OR ( LNXDISK.PCTSPCUSED >= 95 AND STRSCAN(LNXDISK.MOUNTPTU, N’/ABAPCS/MON’) = 1 ) OR ( LNXDISK.PCTSPCUSED >= 95 AND STRSCAN(LNXDISK.MOUNTPTU, N’/JAVACS/MON’) = 1 ) OR ( LNXDISK.PCTSPCUSED >= 95 AND STRSCAN(LNXDISK.MOUNTPTU, N’/RDBMS/MON’) = 1 ) OR ( LNXDISK.PCTSPCUSED >= 95 AND STRSCAN(LNXDISK.MOUNTPTU, N’/SAPAS/MON’) = 1 ) OR ( LNXDISK.PCTSPCUSED >= 95 AND STRSCAN(LNXDISK.MOUNTPTU, N’/opt’) = 1 ) ),
Following the predicate is one or more result lines. These are all in the form of Attribute=value in the Table/Column=raw_data form. There is a leading count of the index of this result line. In this case there were many P lines and many result lines. More comments follow. Ignore the funny emoticons that some Browsers convert equal [=] followed by semicolon [;] into. If needed you can copy/paste the line into a line mode editor for study. Clearly the results were coming in very fast, but apparently they arrived in three separate bundles of 4 total results.
,,,,,,,0,LNXDISK.DSKNAME=/dev/mapper/vgsystem-lv_root;LNXDISK.DSKSIZE=101600;LNXDISK.FSTYPE=ext3;LNXDISK.INODEFREE=6363019;LNXDISK.INODESIZE=6610944;LNXDISK.INODEUSED=247925;LNXDISK.MOUNTPT=/;LNXDISK.MOUNTPTU=/;LNXDISK.ORIGINNODE=zec_uspokpchd01:LZ;LNXDISK.PCTINDAVAL=96;LNXDISK.PCTINDUSED=4;LNXDISK.PCTSPCUSED=97;LNXDISK.SPCAVAIL=3158;LNXDISK.SPCUSED=93282;LNXDISK.TIMESTAMP=1180410001324000,
,,,,,,,1,LNXDISK.DSKNAME=/dev/mapper/vgsystem-lv_root;LNXDISK.DSKSIZE=101600;LNXDISK.FSTYPE=ext3;LNXDISK.INODEFREE=6363014;LNXDISK.INODESIZE=6610944;LNXDISK.INODEUSED=247930;LNXDISK.MOUNTPT=/;LNXDISK.MOUNTPTU=/;LNXDISK.ORIGINNODE=zec_uspokpchd01:LZ;LNXDISK.PCTINDAVAL=96;LNXDISK.PCTINDUSED=4;LNXDISK.PCTSPCUSED=97;LNXDISK.SPCAVAIL=3157;LNXDISK.SPCUSED=93282;LNXDISK.TIMESTAMP=1180410002830000,
What is the problem and How to fix it?
As can be seen the agent used attribute group tablename LNXDISK for all the attributes. However the DisplayItem was KLZDISK.MOUNTPT which does not match anything in the attributes and thus is assigned the null atomize value.
In history, LNX was the attribute group tablename prefix long long ago. However [I think at ITM 6.2 in 2007] this was changed to KLZ as a tablename prefix to avoid conflicts with Unix OS Agent. For compatibility the old names are still recognized and mapped onto each other. The current situation editor could never produce such a situation today. The only way this could have been generated would be with a situation dump [tacmd viewsit -s sitname -e sitname.xml] followed by a manual edit to the xml file and then a replace [tacmd createsit -i sitname.xml]. In that circumstance there is no validity checking performed.
In any case the situation no longer works as expected in this exact case only a single event will be created when two would be expected. This is a monitoring degradation
Summary
Tale #4 of using Event Audit History to understand and correct a type of Incorrect DisplayItem conditions and thus get more results.
Sitworld: Table of Contents
History and Earlier versions
There are no binary objects associated with this project.
1.000000
initial release
Photo Note: Future Ballroom – Cruise Ship Build 2018