John Alvord, IBM Corporation
jalvord@us.ibm.com
Draft #1 – 10 April 2018 – Level 1.00000
Inspiration
The Event History Audit project is complex to grasp as a whole. The following series of posts will track individual diagnostic efforts and how the new project aided in the process.
A Situation Hid Events
This was seen in the Event Audit History Advisory section:
50,EVENTAUDIT1013W,TEMS,Situations [4] lost [merged] events Multiple Events with same DisplayItem at same TEMS second – see EVENTREPORT004
And in that report section:
EVENTREPORT004: Situations with Multiple results at TEMS with same DisplayItem at same second
Situation,Type,Agent_Second,Results,Agent,Atomize,Atom,
ccp_errlog_xul1_aix,Pure,1180331084920000,4,z9x_z9xssodb01:KUL,ULLOGENT.ENTRYDESC,DE3B8540: PATH HAS FAILED,
There is a situation ccp_errlog_xul1_aix. It is Pure [looking at a log]. At the TEMS second 1180331084920000 [2018-03-31 08:49:20] 4 results were seen. The agent name was z9x_z9xssodb01:KUL – a Unix Log Agent. The displayItem was ULLOGENT.ENTRYDESC [the description] and the value that all four results had was “DE3B8540: PATH HAS FAILED, “
Only a single result will be reported, unless you have the TEMS configured for “One Row One Result“. However the alert from three of the four detected conditions has been lost forever. Monitoring has been degraded since you are not getting all the information that is available. If the information is important, you should probably consider setting a different DisplayItem – if possible. Otherwise you can use the “One Row One Result” configuration.
It is also important to determine if the information is really useful. Probably it is because someone constructed the situation. However in the case it is not really useful, the situation should be stopped and deleted.
Deep dive Into the report details
Scan or search ahead for Report 999. It is sorted by first node, then situation, then by Time at the TEMS. I will first describe what you see and the guidance from the column description line.
EVENTREPORT999: Full report sorted by Node/Situation/Time
Situation,Node,Thrunode,Agent_Time,TEMS_Time,Deltastat,Reeval,Results,Atomize,DisplayItem,LineNumber,PDT
Situation – Situation Name, which can be different from the Full Name that you see in situation editor, like too long or other cases.
Node – Managed System Name or Agent Name
Thrunode – The managed system that knows how to communicate with the agent, the remote TEMS in simple cases
Agent_Time – The time as recorded at the Agent during TEMA processing. You will see cases where the same Agent time is seen in multiple TEMS seconds because the Agent can produce data faster than then TEMS can process it at times. Simple cases have a last three digits of 999. Other cases will have tie breakers of 000,001,…,998 when a lot of data is being generated. This the UTC [earlier GMT] time at the agent.
TEMS_Time – The time as recorded at the TEMS during processing. This the UTC [earlier GMT] time.
Deltastat – event status. You generally see Y for open and N for close. There are more not recorded here.
Reeval – Sampling interval [re-evaluation] in seconds and 0 means a pure event.
Results – How many results were seen. The simplest cases are 1 and you would see that if you used -allresults control. In this report you only get a warning when there are multiple results.
Atomize – The table/column specification of the value used for Atomize. It can be null meaning not used.
DisplayItem – The value of the atomize in this instance. Atomize is just the [up to] first 128 bytes of another string attribute.
LineNumber – A debugging helper that tells what line of the TSITSTSH data dump supplied this information
PDT – The Predicate or Situation Formula as it is stored.
The Descriptor line – before we see the results.
ccp_errlog_xul1_aix,z9x_z9xssodb01:KUL,REMOTE_usrdrtm041ccpr2,1180331084920999,1180331084920000,Y,0,4,ULLOGENT.ENTRYDESC,DE3B8540: PATH HAS FAILED,2602,*IF ( ( *VALUE Log_Entries.Log_Path *EQ ‘/var/adm/ras/’ *AND *VALUE Log_Entries.Log_Name *EQ errlog *AND *VALUE Log_Entries.Type *EQ P *AND *VALUE Log_Entries.Class *EQ Hardware *AND *SCAN Log_Entries.Description *NE 4865FA9B *AND *SCAN Log_Entries.Description *NE 476B351D ) *OR ( *VALUE Log_Entries.Log_Path *EQ ‘/var/adm/ras/’ *AND *VALUE Log_Entries.Log_Name *EQ errlog *AND *VALUE Log_Entries.Type *EQ U *AND *SCAN Log_Entries.Description *EQ 4B6BA416 ) ) *UNTIL ( *TTL 7:00:00:00 ),
Following the descriptor line is one or more P [Predicate/formula] lines as used as the Agent logic, followed by the results contributing to the TEMS logic.
,,,,,,,P,*PREDICATE=( ( ULLOGENT.LOGPATH = ‘/var/adm/ras/’ AND ULLOGENT.LOGNAME = ‘errlog’ AND ULLOGENT.ENTRYTYPE = ‘P’ AND ULLOGENT.ENTRYCLASS = ‘H’ AND STRSCAN(ULLOGENT.ENTRYDESC, ‘4865FA9B’) = 0 AND STRSCAN(ULLOGENT.ENTRYDESC, ‘476B351D’) = 0 ) OR ( ULLOGENT.LOGPATH = ‘/var/adm/ras/’ AND ULLOGENT.LOGNAME = ‘errlog’ AND ULLOGENT.ENTRYTYPE = ‘U’ AND STRSCAN(ULLOGENT.ENTRYDESC, ‘4B6BA416’) = 1 ) ),
Following the predicate is one or more result lines. These are all in the form of Attribute=value in the Table/Column=raw_data form. There is a leading count of the index of this result line. In this case there were 3 P lines and 4 index lines. More comments follow. Ignore the funny emoticons that some Browsers convert equal [=] followed by semicolon [;] into. If needed you can copy/paste the line into a line mode editor for study. Clearly the results were coming in very fast, but apparently they arrived in three separate bundles of 4 total results.
,,,,,,,0,ULLOGENT.ENTRYCLASS=H;ULLOGENT.ENTRYDESC=DE3B8540: PATH HAS FAILED;ULLOGENT.ENTRYSRC=hdisk7;ULLOGENT.ENTRYSYS=z9xssodb01;ULLOGENT.ENTRYTIME=1180331084900000;ULLOGENT.ENTRYTYPE=P;ULLOGENT.FREQTHRESH=0;ULLOGENT.LOGNAME=errlog;ULLOGENT.LOGPATH=/var/adm/ras/;ULLOGENT.ORIGINNODE=z9x_z9xssodb01:KUL;ULLOGENT.PERIODTHRS=0;ULLOGENT.TIMESTAMP=1180331084920000;ULLOGENT.UENTRYDESC=DE3B8540: PATH HAS FAILED;ULLOGENT.UENTRYSRC=hdisk7;ULLOGENT.ULOGNAME=errlog;ULLOGENT.ULOGPATH=/var/adm/ras/,
,,,,,,,P,*PREDICATE=( ( ULLOGENT.LOGPATH = ‘/var/adm/ras/’ AND ULLOGENT.LOGNAME = ‘errlog’ AND ULLOGENT.ENTRYTYPE = ‘P’ AND ULLOGENT.ENTRYCLASS = ‘H’ AND STRSCAN(ULLOGENT.ENTRYDESC, ‘4865FA9B’) = 0 AND STRSCAN(ULLOGENT.ENTRYDESC, ‘476B351D’) = 0 ) OR ( ULLOGENT.LOGPATH = ‘/var/adm/ras/’ AND ULLOGENT.LOGNAME = ‘errlog’ AND ULLOGENT.ENTRYTYPE = ‘U’ AND STRSCAN(ULLOGENT.ENTRYDESC, ‘4B6BA416’) = 1 ) ),
,,,,,,,1,ULLOGENT.ENTRYCLASS=H;ULLOGENT.ENTRYDESC=DE3B8540: PATH HAS FAILED;ULLOGENT.ENTRYSRC=hdisk9;ULLOGENT.ENTRYSYS=z9xssodb01;ULLOGENT.ENTRYTIME=1180331084900000;ULLOGENT.ENTRYTYPE=P;ULLOGENT.FREQTHRESH=0;ULLOGENT.LOGNAME=errlog;ULLOGENT.LOGPATH=/var/adm/ras/;ULLOGENT.ORIGINNODE=z9x_z9xssodb01:KUL;ULLOGENT.PERIODTHRS=0;ULLOGENT.TIMESTAMP=1180331084920000;ULLOGENT.UENTRYDESC=DE3B8540: PATH HAS FAILED;ULLOGENT.UENTRYSRC=hdisk9;ULLOGENT.ULOGNAME=errlog;ULLOGENT.ULOGPATH=/var/adm/ras/,
,,,,,,,P,*PREDICATE=( ( ULLOGENT.LOGPATH = ‘/var/adm/ras/’ AND ULLOGENT.LOGNAME = ‘errlog’ AND ULLOGENT.ENTRYTYPE = ‘P’ AND ULLOGENT.ENTRYCLASS = ‘H’ AND STRSCAN(ULLOGENT.ENTRYDESC, ‘4865FA9B’) = 0 AND STRSCAN(ULLOGENT.ENTRYDESC, ‘476B351D’) = 0 ) OR ( ULLOGENT.LOGPATH = ‘/var/adm/ras/’ AND ULLOGENT.LOGNAME = ‘errlog’ AND ULLOGENT.ENTRYTYPE = ‘U’ AND STRSCAN(ULLOGENT.ENTRYDESC, ‘4B6BA416’) = 1 ) ),
,,,,,,,2,ULLOGENT.ENTRYCLASS=H;ULLOGENT.ENTRYDESC=DE3B8540: PATH HAS FAILED;ULLOGENT.ENTRYSRC=hdisk10;ULLOGENT.ENTRYSYS=z9xssodb01;ULLOGENT.ENTRYTIME=1180331084900000;ULLOGENT.ENTRYTYPE=P;ULLOGENT.FREQTHRESH=0;ULLOGENT.LOGNAME=errlog;ULLOGENT.LOGPATH=/var/adm/ras/;ULLOGENT.ORIGINNODE=z9x_z9xssodb01:KUL;ULLOGENT.PERIODTHRS=0;ULLOGENT.TIMESTAMP=1180331084920000;ULLOGENT.UENTRYDESC=DE3B8540: PATH HAS FAILED;ULLOGENT.UENTRYSRC=hdisk10;ULLOGENT.ULOGNAME=errlog;ULLOGENT.ULOGPATH=/var/adm/ras/,
,,,,,,,3,ULLOGENT.ENTRYCLASS=H;ULLOGENT.ENTRYDESC=DE3B8540: PATH HAS FAILED;ULLOGENT.ENTRYSRC=hdisk11;ULLOGENT.ENTRYSYS=z9xssodb01;ULLOGENT.ENTRYTIME=1180331084900000;ULLOGENT.ENTRYTYPE=P;ULLOGENT.FREQTHRESH=0;ULLOGENT.LOGNAME=errlog;ULLOGENT.LOGPATH=/var/adm/ras/;ULLOGENT.ORIGINNODE=z9x_z9xssodb01:KUL;ULLOGENT.PERIODTHRS=0;ULLOGENT.TIMESTAMP=1180331084920000;ULLOGENT.UENTRYDESC=DE3B8540: PATH HAS FAILED;ULLOGENT.UENTRYSRC=hdisk11;ULLOGENT.ULOGNAME=errlog;ULLOGENT.ULOGPATH=/var/adm/ras/,
What is the problem and How to fix it?
The agent logic produced 4 results tagged with the same agent time and only a single event was created. This *might* not be any problem if they were all identical – although you might want to see if the agent has controls to suppress duplicate results. Or if the data is boring and no one needs to fix it, adjust the formula or even stop and delete the situation. The biggest performance improvement you can ever achieve is NOT doing unneeded work.
In this particular case, DisplayItem was set to the first 128 characters of ULLOGENT.ENTRYDESC and in fact all four results had “DE3B8540: PATH HAS FAILED” and so all were merged into one event and 3 events were lost. Looking through all the results row attributes I see
0: ULLOGENT.ENTRYSRC=hdisk7
1: ULLOGENT.ENTRYSRC=hdisk9
2: ULLOGENT.ENTRYSRC=hdisk10
3: ULLOGENT.ENTRYSRC=hdisk11
For this particular case, if you made DisplayItem be ULLOGENT.ENTRYSRC you would get the actual four event data that was actually present but three hidden. It passes a reasonableness test in that the number of cases looks limited. If a DisplayItem has something like a date or time or a callstack, it can cause a TEMS performance data because all cases of situation/node/displayitem status are kept in storage and the TSITSTSC table and it grows and grows forever [or at least until the TEMS is recycled]
This is just one case, so you should review others and talk it through with Subject Matter Experts on the contents of that log. Your work isn’t done until you understand all the possible ramifications.
Summary
Tale #2 of using Event Audit History to understand and correct one type of Incorrect DisplayItem conditions and thus get more results.
Sitworld: Table of Contents
History and Earlier versions
There are no binary object associated with this project.
1.000000
initial release
Photo Note: Holding Things Together on Cruise Ship Build 2018
One thought on “Sitworld: Event History #2 Duplicate DisplayItems At Same Second”