John Alvord, IBM Corporation
Inspirations
A user was crafting a situation to alert on a Windows Event Log. The ID attribute was known. He wanted to exclude cases where Description had a certain form. The situation formula was unable to make that exclusion.
Another user had a Unix Log Agent situation that was causing a crash on the remote TEMS crash it was connected to. In this case there was a *TIME test which forced all incoming rows to be evaluated at the TEMS. That was too much work for the remote TEMS.
Overview
The pattern is to create an initial situation filter and use an action command to write some results into a secondary or derivative log. Then a second situation completes the necessary work. In the Windows Event Log case the result was the needed filtering became possible. In the Unix Log case the remote TEMS workload was reduced by 250 times.
As a rule, situations have limited filtering capability. Simple formulae are easy and more complicated formulae are usually impossible. Using action commands and several situations the needed result is often within reach.
Case 1: Windows Event Log
The customer wanted to exclude the Events from the monitoring who have the String “Account Name: a22907sec” within the Description. However the characters following the colon [:] were sometimes a space, sometimes a tab and sometimes a space and tab.
The customer attempt looked roughly like this:
pdt: *IF *VALUE NT_Event_Log.Event_ID *EQ 99 *AND
*VALUE NT_Event_Log.Log_Name_U *EQ ‘Application’ *AND
*SCAN NT_Event_Log.Description_U *EQ ‘Account Name: a22907sec’
There was no way to account for the space/tab/space+tab possibilities following the colon. Using two *SCANs was not acceptable because it did not recognize the fact that one text test must follow the other one.
The solution was to create two situations. The first one looked like this
pdt: *IF *VALUE NT_Event_Log.Event_ID *EQ 99 *AND
*VALUE NT_Event_Log.Log_Name_U *EQ ‘Application’
The action command follows and will need some explanation:
echo &{NT_Event_Log.Description} |
findstr /R /C:”^.*Account Name:.*a22907sec.*$” >nul 2>&1 ||
EVENTCREATE /T ERROR /ID 902 /L APPLICATION /D ” &{NT_Event_Log.Description} ”
That one long line is three separate commands linked together with Windows command line meta characters. Here is the meaning of each section.
1) echo &{NT_Event_Log.Description} |
This command takes the Description of the current Windows Event Log, writes to standard out. The pipe character [|] at the end lets the next command read it as standard input.
2) findstr /r /c:”^.*Account Name:.*a22907sec.*$” >nul 2>&1 ||
The findstr command is available in all Windows environments. /r means the search string is a regular expression. The /c:”….” Expresses the search string literally.
Regular expressions are a world apart. I will pick this one apart:
^ Anchor to beginning of line
.* Match any number of characters
Account Name: Match these characters
.* Match any number of characters
a22907sec Match these characters
.* Match any number of characters
$ Anchor to end of line
The Windows findstr command has some limitations. See this findstr reference
The >nul 2>&1 suppresses any output
The || at the end means that the next command will be processed only if the exit code from this command was non-zero. If there was a match the exit code is zero. That means the events to be skipped are not processed further.
3) EVENTCREATE /T ERROR /ID 902 /L APPLICATION /D ” &{NT_Event_Log.Description} “
The last command does an EVENTCREATE command. That is a Windows command to create a new log event. The ID in this case is set to an unused number. There was some environmental work to do – to allow the Windows OS Agent userid to run the EVENTCREATE command.
A second situation was created
pdt: *IF *VALUE NT_Event_Log.Event_ID *EQ 902 *AND
*VALUE NT_Event_Log.Log_Name_U *EQ ‘Application’
That second situation supplied just the needed events since it had been pre-filtered.
The first situation is a helper situation. Thus it is defined to NOT be send to any event receivers and NOT be associated on the Portal Client Navigator.
This shows how a derivative log and a second situation achieved the customer goal.
Case 2: Unix Log Agent causes remote TEMS crashes
A customer had 21 situations with formula like this:
pdt: *IF *VALUE Oracle_Alert_Log_Details.Message_ID *EQ *4031 *AND
*TIME Oracle_Alert_Log_Details.Message_Timestamp *GT ‘Local_Time.Timestamp -1H’
The Oracle log involved had many events arriving all the time.
The underlying ITM issue was that when *TIME is present, there is *no* filtering done at the agent: every single log entry causes a result to be sent to the remote TEMS. All filtering is done as part of the TEMS Data Server [SQL] processing. There were 21 such situations and together they were delivering 12.6 megabytes of results data to the remote TEMS every minute. That was far too much data for the remote TEMS so backlog built up and the remote TEMS crashed couple of days.
The solution was to create a secondary log using a helper situation.
The helper situation formula looked like this:
pdt: *IF *VALUE Oracle_Alert_Log_Details.Message_ID *EQ ‘*4031’
And the action command was an echo to a derivative log file with the attributes needed.
The second situation was a Unix Log Agent situation which tested had the identical test [under different attribute names]. The benefit was that only rows with ID ending in 4031 were transmitted to the remote TEMS. The reduction was roughly 250 to 1 compared to the original. The remote TEMS stopped crashing.
Overview
The Derivative Log Pattern compounds the power of situations and action commands. You can achieve greater efficiency and greater power of expression.
If you create an interesting example, please add a comment here.
Sitworld: Table of Contents
NOTE: The photo is the moon setting in the Pacific about 4am on 25 April 2013.