John Alvord, IBM Corporation
jalvord@us.ibm.com
Draft #1 – 10 May 2018 – Level 1.00000
Inspiration
The Event History Audit project is complex to grasp as a whole. The following series of posts will track individual diagnostic efforts and how the new project aided in the process.
This was about diagnosing a situation that produces a lot of results and what that means and what to do. Then it became more complicated and interesting.
This was seen in the Summary Report
Delay Estimate opens[110] over_minimum [6] over_average [3.33 seconds]
There were 110 Situation Event Open conditions. From these the minimum delay from that each agent was calculated. That means the time in seconds between the Agent Time and the TEMS time. In a quick local network that might always be 0 or 1 second. This technique is used to judge delays in the face of time zone differences and time setting differences. In this case 6 Situation Open Events took more than the minimum time and the average time over minimum was 3.33 seconds.
This was seen in the Report016: Delay Report by Node and Situation
Node,Situation,Atomize,Delay,Min_delay,GBLTMSTMP,Line,
RZ:multip-multip-vaathmr406:RDB,wlp_fretbsp_grzc_std,,4,0,1180429231916015,2757,
RZ:multip-multip-vaathmr406:RDB,wlp_fretbsp_grzw_std,,3,0,1180429225815145,8103,
RZ:multip-multip-vaathmr406:RDB,wlp_fretbsp_grzw_std,,5,0,1180429235517039,2645,
RZ:multip-multip-vaathmr406:RDB,wlp_tbspro_grzc_std,,6,0,1180429235520040,5155,
You can see the minimum delay was 0 seconds. In the report here the observed delay was 4/3/5/6 seconds. The TEMS timestamp is shown along with the line number of the data file which had the information.
Deep dive Into the report details
Scan or search ahead for Report 999. It is sorted by first node, then situation, then by Time at the TEMS. I will first describe what you see and the guidance from the column description line.
EVENTREPORT999: Full report sorted by Node/Situation/Time
Situation,Node,Thrunode,Agent_Time,TEMS_Time,Deltastat,Reeval,Results,Atomize,DisplayItem,LineNumber,PDT
Situation – Situation Name, which can be different from the Full Name that you see in situation editor, like too long or other cases.
Node – Managed System Name or Agent Name
Thrunode – The managed system that knows how to communicate with the agent, the remote TEMS in simple cases
Agent_Time – The time as recorded at the Agent during TEMA processing. You will see cases where the same Agent time is seen in multiple TEMS seconds because the Agent can produce data faster than then TEMS can process it at times. Simple cases have a last three digits of 999. Other cases will have tie breakers of 000,001,…,998 when a lot of data is being generated. This the UTC [earlier GMT] time at the agent.
TEMS_Time – The time as recorded at the TEMS during processing. This the UTC [earlier GMT] time.
Deltastat – event status. You generally see Y for open and N for close. There are more not recorded here.
Reeval – Sampling interval [re-evaluation] in seconds and 0 means a pure event.
Results – How many results were seen. The simplest cases are 1 and you would see that if you used -allresults control. In this report you only get a warning when there are multiple results.
Atomize – The table/column specification of the value used for Atomize. It can be null meaning not used.
DisplayItem – The value of the atomize in this instance. Atomize is just the [up to] first 128 bytes of another string attribute.
LineNumber – A debugging helper that tells what line of the TSITSTSH data dump supplied this information
PDT – The Predicate or Situation Formula as it is stored.
The Descriptor line – before we see the results.
wlp_fretbsp_grzc_std,RZ:multip-multip-vaathmr406:RDB,REMOTE_va10plvtem021,1180429231912000,1180429231916015,Y,180,5,,,2757,*IF *SCAN KRZ_RDB_TABLESPACENORMAL_USAGE.Tablespace_Name *NE ‘TEMP’ *AND *SCAN KRZ_RDB_TABLESPACENORMAL_USAGE.Tablespace_Name *NE ‘UNDO’ *AND *SCAN KRZ_RDB_TABLESPACENORMAL_USAGE.Tablespace_Name *NE ‘RBS’ *AND *SCAN KRZ_RDB_TABLESPACENORMAL_USAGE.Tablespace_Name *NE ‘ROLLBACK’ *AND *SCAN KRZ_RDB_TABLESPACENORMAL_USAGE.Tablespace_Name *NE ‘FNTMP’ *AND *SCAN KRZ_RDB_TABLESPACENORMAL_USAGE.Tablespace_Name *NE ‘2013’ *AND *SCAN KRZ_RDB_TABLESPACENORMAL_USAGE.Tablespace_Name *NE ‘UNDOTBS1’ *AND *SCAN KRZ_RDB_TABLESPACENORMAL_USAGE.Tablespace_Name *NE ‘UNDOTBS2’ *AND *VALUE KRZ_RDB_TABLESPACENORMAL_USAGE.Percentage_Free_To_Allocated *LE 5.00 *AND *SIT wlp_tbspro_grzc_std *EQ *TRUE,
It is an Oracle Agent target looking for free to allocate less than or equal to 5%. Note that it does not have any DisplayItem. The is measured here where the Agent_Time was 180429231912000 and the TEMS time was 4 seconds later 1180429231916015
Following the descriptor line is one or more P [Predicate/formula] lines as used as the Agent logic, followed by the results contributing to the TEMS logic.
,,,,,,,P,*PREDICATE=STRSCAN(KRZTSNLUE.TNAME, N’TEMP’) = 0 AND STRSCAN(KRZTSNLUE.TNAME, N’UNDO’) = 0 AND STRSCAN(KRZTSNLUE.TNAME, N’RBS’) = 0 AND STRSCAN(KRZTSNLUE.TNAME, N’ROLLBACK’) = 0 AND STRSCAN(KRZTSNLUE.TNAME, N’FNTMP’) = 0 AND STRSCAN(KRZTSNLUE.TNAME, N’2013′) = 0 AND STRSCAN(KRZTSNLUE.TNAME, N’UNDOTBS1′) = 0 AND STRSCAN(KRZTSNLUE.TNAME, N’UNDOTBS2′) = 0 AND KRZTSNLUE.PCIFREE <= 00500 AND KRZTSOVEW.STATUS <> N’READ’,
,
Following the predicate is one or more result lines. These are all in the form of Attribute=value in the Table/Column=raw_data form. There is a leading count of the index of this result line. In this case there was one P line and 5 result lines.
,,,,,,,0,KRZTSNLUE.DBHOSTNAME=vaathmr406;KRZTSNLUE.DBINSTNAME=multip;KRZTSNLUE.DBUNIQNAME=multip;KRZTSNLUE.KBFREE=512000;KRZTSNLUE.KBUSED=95744000;KRZTSNLUE.KBYTES=96256000;KRZTSNLUE.KLARGEST=512000;KRZTSNLUE.KMAXSIZE=13421766400;KRZTSNLUE.MBFREE=500;KRZTSNLUE.MBUSED=93500;KRZTSNLUE.MBYTES=94000;KRZTSNLUE.MLARGEST=500;KRZTSNLUE.MMAXSIZE=13107194;KRZTSNLUE.ORIGINNODE=RZ:multip-multip-vaathmr406:RDB;KRZTSNLUE.PCIFMAX=9929;KRZTSNLUE.PCIFREE=53;KRZTSNLUE.PCIMAX=72;KRZTSNLUE.PCIUSED=9947;KRZTSNLUE.TIMESTAMP=1180429231627000;KRZTSNLUE.TNAME=CASOL_CH201706_DAT;KRZTSNLUE.TSFGNUM=1,
,,,,,,,1,KRZTSNLUE.DBHOSTNAME=vaathmr406;KRZTSNLUE.DBINSTNAME=multip;KRZTSNLUE.DBUNIQNAME=multip;KRZTSNLUE.KBFREE=2150400;KRZTSNLUE.KBUSED=83865600;KRZTSNLUE.KBYTES=86016000;KRZTSNLUE.KLARGEST=2150400;KRZTSNLUE.KMAXSIZE=13421766400;KRZTSNLUE.MBFREE=2100;KRZTSNLUE.MBUSED=81900;KRZTSNLUE.MBYTES=84000;KRZTSNLUE.MLARGEST=2100;KRZTSNLUE.MMAXSIZE=13107194;KRZTSNLUE.ORIGINNODE=RZ:multip-multip-vaathmr406:RDB;KRZTSNLUE.PCIFMAX=9938;KRZTSNLUE.PCIFREE=250;KRZTSNLUE.PCIMAX=64;KRZTSNLUE.PCIUSED=9750;KRZTSNLUE.TIMESTAMP=1180429231627000;KRZTSNLUE.TNAME=CASOL_CH201706_IDX;KRZTSNLUE.TSFGNUM=1,
,,,,,,,2,KRZTSNLUE.DBHOSTNAME=vaathmr406;KRZTSNLUE.DBINSTNAME=multip;KRZTSNLUE.DBUNIQNAME=multip;KRZTSNLUE.KBFREE=4096000;KRZTSNLUE.KBUSED=86016000;KRZTSNLUE.KBYTES=90112000;KRZTSNLUE.KLARGEST=4096000;KRZTSNLUE.KMAXSIZE=13421766400;KRZTSNLUE.MBFREE=4000;KRZTSNLUE.MBUSED=84000;KRZTSNLUE.MBYTES=88000;KRZTSNLUE.MLARGEST=4000;KRZTSNLUE.MMAXSIZE=13107194;KRZTSNLUE.ORIGINNODE=RZ:multip-multip-vaathmr406:RDB;KRZTSNLUE.PCIFMAX=9936;KRZTSNLUE.PCIFREE=455;KRZTSNLUE.PCIMAX=67;KRZTSNLUE.PCIUSED=9545;KRZTSNLUE.TIMESTAMP=1180429231627000;KRZTSNLUE.TNAME=CASOL_CH201707_DAT;KRZTSNLUE.TSFGNUM=1,
,,,,,,,3,KRZTSNLUE.DBHOSTNAME=vaathmr406;KRZTSNLUE.DBINSTNAME=multip;KRZTSNLUE.DBUNIQNAME=multip;KRZTSNLUE.KBFREE=2764800;KRZTSNLUE.KBUSED=77107200;KRZTSNLUE.KBYTES=79872000;KRZTSNLUE.KLARGEST=2764800;KRZTSNLUE.KMAXSIZE=13421766400;KRZTSNLUE.MBFREE=2700;KRZTSNLUE.MBUSED=75300;KRZTSNLUE.MBYTES=78000;KRZTSNLUE.MLARGEST=2700;KRZTSNLUE.MMAXSIZE=13107194;KRZTSNLUE.ORIGINNODE=RZ:multip-multip-vaathmr406:RDB;KRZTSNLUE.PCIFMAX=9943;KRZTSNLUE.PCIFREE=346;KRZTSNLUE.PCIMAX=60;KRZTSNLUE.PCIUSED=9654;KRZTSNLUE.TIMESTAMP=1180429231627000;KRZTSNLUE.TNAME=CASOL_CH201707_IDX;KRZTSNLUE.TSFGNUM=1,
,,,,,,,4,KRZTSNLUE.DBHOSTNAME=vaathmr406;KRZTSNLUE.DBINSTNAME=multip;KRZTSNLUE.DBUNIQNAME=multip;KRZTSNLUE.KBFREE=512000;KRZTSNLUE.KBUSED=95744000;KRZTSNLUE.KBYTES=96256000;KRZTSNLUE.KLARGEST=512000;KRZTSNLUE.KMAXSIZE=13421766400;KRZTSNLUE.MBFREE=500;KRZTSNLUE.MBUSED=93500;KRZTSNLUE.MBYTES=94000;KRZTSNLUE.MLARGEST=500;KRZTSNLUE.MMAXSIZE=13107194;KRZTSNLUE.ORIGINNODE=RZ:multip-multip-vaathmr406:RDB;KRZTSNLUE.PCIFMAX=9929;KRZTSNLUE.PCIFREE=53;KRZTSNLUE.PCIMAX=72;KRZTSNLUE.PCIUSED=9947;KRZTSNLUE.TIMESTAMP=1180429231627000;KRZTSNLUE.TNAME=CASOL_CH201708_DAT;KRZTSNLUE.TSFGNUM=1,
The distinguishing attribute is KRZTSNLUE.TNAME. For example result zero has CASOL_CH201706_DAT and result 1 has CASOL_CH201706_IDX, and so on.
What are the problems and How to fix them?
The first problem is that there are five results and they are being merged – so four are lost. That could be resolved by configuring DisplayItem to KRZTSNLUE.TNAME.
The time delay of several seconds is a hint that the TEMS involved is overloaded, the network might be slow at times, or the agent may be impacted by another process. It isn’t a big issue like missing situation events but it should be considered and checked.
Summary
Tale #13 of using Event Audit History is about reviewing a case where a situation was observed with delays and that led also to a case of missing situation events because no DisplayItem was configured.
Sitworld: Table of Contents
History and Earlier versions
There are no binary objects associated with this project.
1.000000
initial release
Photo Note: Looking South from Nepenthe Restaurant – Big Sur 2003
One thought on “Sitworld: Event History #13 Delay Delay Delay”