Sitworld: Event History #5 Multiple Results Same DisplayItem Same Second

grandentrace2016

John Alvord, IBM Corporation

jalvord@us.ibm.com

Draft #1 – 16 April 2018 – Level 1.00000

Follow on twitter

Inspiration

The Event History Audit project is complex to grasp as a whole. The following series of posts will track individual diagnostic efforts and how the new project aided in the process.

A Situation Event Conflict Between DisplayItem and Attributes

This was seen in the Event Audit History Advisory section:

50,EVENTAUDIT1009W,TEMS,Situations [7] with multiple results at agent with same DisplayItem at same second – see EVENTREPORT005

This particular case showed up when the DisplayItem was not configured and so the Atomize value was null. However it could also occur when DisplayItem was specified and the result was null or just an actual duplicate atomize value.

And in that report section:

EVENTREPORT005: Situations with Multiple results at Agent with same DisplayItem at same second

Situation,Type,Agent_Second,Results,Agent,Atomize,Atom,

bnc_fs_tws_exists_ora_twsq,Sampled,1180410020224000,2,bnc_axedaora09:KUX,,,

There is a situation bnc_fs_tws_exists_ora_twsq. It is Sampled situation and at Agent second  1180410020224000 on agent bnc_axedaora09:KUX there were multiple results.  It has  no DisplayItem configured. As a result an event was hidden.

Deep dive Into the report details

Scan or search ahead for Report 999. It is sorted by first node, then situation, then by Time at the TEMS. I will first describe what you see and the guidance from the column description line.

EVENTREPORT999: Full report sorted by Node/Situation/Time

Situation,Node,Thrunode,Agent_Time,TEMS_Time,Deltastat,Reeval,Results,Atomize,DisplayItem,LineNumber,PDT

Situation – Situation Name, which can be different from the Full Name that you see in situation editor, like too long or other cases.

Node – Managed System Name or Agent Name

Thrunode – The managed system that knows how to communicate with the agent, the remote TEMS in simple cases

Agent_Time – The time as recorded at the Agent during TEMA processing. You will see cases where the same Agent time is seen in multiple TEMS seconds because the Agent can produce data faster than then TEMS can process it at times. Simple cases have a last three digits of 999. Other cases will have tie breakers of 000,001,…,998 when a lot of data is being generated. This the UTC [earlier GMT] time at the agent.

TEMS_Time – The time as recorded at the TEMS during processing. This the UTC [earlier GMT] time.

Deltastat – event status. You generally see Y for open and N for close. There are more not recorded here.

Reeval – Sampling interval [re-evaluation] in seconds and 0 means a pure event.

Results – How many results were seen. The simplest cases are 1 and you would see that if you used -allresults control. In this report you only get a warning when there are multiple results.

Atomize – The table/column specification of the value used for Atomize. It can be null meaning not used.

DisplayItem – The value of the atomize in this instance. Atomize is just the [up to] first 128 bytes of another string attribute.

LineNumber – A debugging helper that tells what line of the TSITSTSH data dump supplied this information

PDT  – The Predicate or Situation Formula as it is stored.

The Descriptor line – before we see the results.

bnc_fs_tws_exists_ora_twsq,bnc_axedaora09:KUX,REMOTE_camtram070nbfra,1180410020224000,1180410020226000,Y,300,2,,,2249,*IF *SCAN Disk.Mount_Point_U *EQ ‘/app/tws/twsq’,

Situation was bnc_fs_tws_exists_ora_twsq, agent was bnc_axedaora09:KUX. thrunode was REMOTE_camtram070nbfra. Agent_time was 1180410020224000 and TEMS_time was 118041002022600, 2 seconds later. It was an Open event [Y], there was no DisplayItem, sampling interval was 300 seconds was and the situation formula checked for the existence of a certain mount point.

Following the descriptor line is one or more P [Predicate/formula] lines as used as the Agent logic, followed by the results contributing to the TEMS logic.

,,,,,,,P,*PREDICATE=STRSCAN(UNIXDISK.UMOUNTPT, N’/app/tws/twsq’) = 1,

Following the predicate is one or more result lines. These are all in the form of Attribute=value in the Table/Column=raw_data form. There is a leading count of the index of this result line. In this case there were many P lines and many result lines. More comments follow. Ignore the funny emoticons that some Browsers convert  equal [=] followed by  semicolon [;] into. If needed you can copy/paste the line into a line mode editor for study. Clearly the results were coming in very fast, but apparently they arrived in three separate bundles of 4 total results.

,,,,,,,0,UNIXDISK.DSKNAME=/dev/lv_TWS_twsq;UNIXDISK.DSKSIZE=524288;UNIXDISK.DSKSIZE64=524288;UNIXDISK.DSKSIZEGB=1;UNIXDISK.DSKSIZEGB6=50;UNIXDISK.DSKSIZEMB=512;UNIXDISK.DSKSIZEMB6=51200;UNIXDISK.FILESYSTYP=jfs2;UNIXDISK.FSSTATUS=2;UNIXDISK.INODEFRE64=58591;UNIXDISK.INODEFREE=58591;UNIXDISK.INODESIZ64=61663;UNIXDISK.INODESIZE=61663;UNIXDISK.INODEUSE64=3072;UNIXDISK.INODEUSED=3072;UNIXDISK.MOUNTOPT=rw,log=/dev/hd8;UNIXDISK.MOUNTPT=/app/tws/twsq;UNIXDISK.ORIGINNODE=bnc_axedaora09:KUX;UNIXDISK.PCTINDAVAL=95;UNIXDISK.PCTINDUSED=5;UNIXDISK.PCTSPCAV=49;UNIXDISK.PCTSPCUSED=51;UNIXDISK.SPAVGB=1;UNIXDISK.SPAVGB64=25;UNIXDISK.SPAVMB=256;UNIXDISK.SPAVMB64=25552;UNIXDISK.SPCAVAIL=261644;UNIXDISK.SPCAVAIL64=261644;UNIXDISK.SPCUSED=262644;UNIXDISK.SPCUSED64=262644;UNIXDISK.SPUSEDGB=0;UNIXDISK.SPUSEDGB64=25;UNIXDISK.SPUSEDMB=256;UNIXDISK.SPUSEDMB64=25648;UNIXDISK.TIMESTAMP=1180410020224000;UNIXDISK.UDSKNAME=/dev/lv_TWS_twsq;UNIXDISK.UMOUNTPT=/app/tws/twsq;UNIXDISK.VGN=rootvg;UNIXDISK.ZFILLED=-1;UNIXDISK.ZFILLEDPCT=-1;UNIXDIS

K.ZQUOTA=-1;UNIXDISK.ZREFQUOTA=-1;UNIXDISK.ZREFRESERV=-1;UNIXDISK.ZRESERV=-1;UNIXDISK.ZUCHILDREN=-1;UNIXDISK.ZUDATASET=-1;UNIXDISK.ZUREFRES=-1;UNIXDISK.ZUSNAPS=-1,

,,,,,,,1,UNIXDISK.DSKNAME=/dev/lv_logtwsq;UNIXDISK.DSKSIZE=524288;UNIXDISK.DSKSIZE64=524288;UNIXDISK.DSKSIZEGB=1;UNIXDISK.DSKSIZEGB6=50;UNIXDISK.DSKSIZEMB=512;UNIXDISK.DSKSIZEMB6=51200;UNIXDISK.FILESYSTYP=jfs2;UNIXDISK.FSSTATUS=2;UNIXDISK.INODEFRE64=114500;UNIXDISK.INODEFREE=114500;UNIXDISK.INODESIZ64=116489;UNIXDISK.INODESIZE=116489;UNIXDISK.INODEUSE64=1989;UNIXDISK.INODEUSED=1989;UNIXDISK.MOUNTOPT=rw,log=/dev/hd8;UNIXDISK.MOUNTPT=/app/tws/twsq/TWS/stdlist;UNIXDISK.ORIGINNODE=bnc_axedaora09:KUX;UNIXDISK.PCTINDAVAL=98;UNIXDISK.PCTINDUSED=2;UNIXDISK.PCTSPCAV=98;UNIXDISK.PCTSPCUSED=2;UNIXDISK.SPAVGB=1;UNIXDISK.SPAVGB64=50;UNIXDISK.SPAVMB=503;UNIXDISK.SPAVMB64=50292;UNIXDISK.SPCAVAIL=514988;UNIXDISK.SPCAVAIL64=514988;UNIXDISK.SPCUSED=9300;UNIXDISK.SPCUSED64=9300;UNIXDISK.SPUSEDGB=0;UNIXDISK.SPUSEDGB64=0;UNIXDISK.SPUSEDMB=9;UNIXDISK.SPUSEDMB64=908;UNIXDISK.TIMESTAMP=1180410020224000;UNIXDISK.UDSKNAME=/dev/lv_logtwsq;UNIXDISK.UMOUNTPT=/app/tws/twsq/TWS/stdlist;UNIXDISK.VGN=rootvg;UNIXDISK.ZFILLED=-1;UNIXDISK.ZFILL

EDPCT=-1;UNIXDISK.ZQUOTA=-1;UNIXDISK.ZREFQUOTA=-1;UNIXDISK.ZREFRESERV=-1;UNIXDISK.ZRESERV=-1;UNIXDISK.ZUCHILDREN=-1;UNIXDISK.ZUDATASET=-1;UNIXDISK.ZUREFRES=-1;UNIXDISK.ZUSNAPS=-1,

What is the problem and How to fix it?

As can be seen the agent used attribute group UNIXDISK.UMOUNTPT was /app/tws/twsq in the first result in the second was /app/tws/twsq/TWS/stdlist. Since DisplayItem was not configured, only the first generated an event. As a result the second result and potential event was hidden.

One solution is to set the DisplayItem be .UNIXDISK.DSKNAME or “Disk.Name” as it would be seen in the Situation Editor. In that way you would get two events for the two conditions and thus better monitoring.

A Warning Note

This situations violates the general guidance that situations should be Rare, Exceptional and Fixable. If this situation is true, it is true pretty much forever. And likely no one is going to fix it. What is going to happen is the situation processing will generate confirmation results every sampling interval “forever”. On the other hand, maybe there are relatively few of them and someone is going around an uninstalling some software packages and thus the open events will gradually close. In any event, you should think carefully about always true situations and conditions that are never going to be fixed.

Summary

Tale #5 of using Event Audit History to understand and correct a  type of Incorrect DisplayItem conditions and thus get more results.

Sitworld: Table of Contents

History and Earlier versions

There are no binary objects associated with this project.

1.000000

initial release

Photo Note: Future Grand Entrance – Cruise Ship Build 2016

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: