John Alvord, IBM Corporation
Draft #29 – 23 August 2020 – Level 1.48000
The ITM Situation Audit tool performs a static analysis on ITM Situation definitions and creates a report showing which situations are probably filtered at the agent and which are probably filtered at the TEMS the agent reports to. Agent filtering is low impact and TEMS filtering can be high impact and sometimes causes TEMS instability.
From 1.03000 about 20 cases of possible problem situations are identified in Advisory messages. The accompanying Word file contains an exact description of the issue, the background and a recommended recovery action.
From 1.04000 the FULLNAME is used in most report locations.
From 1.05000 the “filter at agent” section is optional, distribution by situation group is monitored, an optional distribution report is available and two rare problem situations have advisories.
From 1.07000 new advisory on situations which run action commands at every interval. Also Absence of an action command is handled correctly.
From 1.12000 new advisory when an action command uses substitutions that are not in fact available based on Situation Formula.
From 1.13000 four new advisories on UADVISOR [history data collection] issues and one for TEMS_Alert available but not running. One of the UADVISOR warnings is only enabled with the -sth24 option. I consider 24 hour export time a best practice but that is not a consensus view yet.
From 1.14000 a new ini control skip added [multiples allowed]. These named advisory codes will not be part of the Situation Audit Report although the advisories will be part of the overall summary count.
From 1.26000 the -lst input option is added. It is a good way to get data from the hub TEMS for processing on another system. That is important in this case because one of the CPAN modules Marpa needs extra work to install on older Perl levels. It is relatively easier to install on a Windows workstation. The technique used is faster and more reliable then the SOAP method.
From 1.30000 an advisory is created for the case of a Situation formula using a multiple row attribute group and not having a DisplayItem. That can result in missing situation events.
A zip file containing the program objects are here. In the History section toward the end there are links to prior levAdd advisory for mixed *NEs and *ORsels if needed.
ITM Situation Audit Report
Here is an example of a situation audit on a test ITM environment with 400+ situations.
The columns show the situation name, the tems/agent filter status and the PDT. The tems filtered cases are shown first.
By default, only Run at Startup situations are shown and UADVISOR situations are always skipped. Run at Startup situations that are not distributed are ignored by default.
ITM Situation Audit Installation
The ITM Situation audit package includes one Perl program that uses CPAN modules. All the testing has been done on Windows with http://www.activestate.com at the Perl 5.20.1 level. One of the packages used is Marpa::R2 which is an advanced high speed parsing/lexing system. The testing has been done with Marpa::R2 level 2.104000. That may not be on the Perl Package Manager database and so you will have to do a command line install. Here is a guide for Windows:
First make the Perl install directory the current directory
Start the cpan program
First the mingw64 package has to be installed to get a C compiler and make program
Next install marpa – don’t be surprised that other packages will be installed.
The other CPAN modules are all in Perl Package Manager
If you are unsure, just start the program and see what complaints are made.
I would *not* attempt to run this on a system where you do not have the ability to install and/or update Perl.
One of the main inputs is the tacmd bulkexportsit -d output files, there is no need to run this on the TEMS systems. At level 1.26000 the -lst option
The supplied files are
1) program is sitaudit.pl
2) a model initialization file sitaudit.ini
3) sitaudit.cmd command file for getting Windows TEMS data [-lst option]
4) sitaudit.sh command file for getting Linux/Unix TEMS data [-lst option]
5) sitaudit.tar file holding (4) to avoid line end problems between Windows and Linux/Unix [-lst option]
ITM Situation Audit Configuration and Usage
The ITM Situation Audit package has controls to match installation requirements but the defaults work in most cases. Some controls are in the command line options and some are in the sitaudit.ini file. Following is a full list of the controls.
You will have to specify exactly one of the bulk or txt or soap options, since that is where the data comes from.
The following table shows all options. All command line options except -h and –ini and three debug controls can be entered in the ini file. The command line takes precedence if both are present. In the following table, a blank means the option will not be recognized in the context. All controls are lower case only.
|-agent_filter||agent_filter||off||Produce Agent filter report|
|-log||log||./sitaudit.log||Name of log file|
|-ini||./sitaudit.ini||Name of ini file|
|-debuglevel||90||Control message volume|
|-debug||off||Turn on some debug points|
|-dist||dist||off||Product distribution report|
|-v||verbose||off||Messages on console also|
|-vt||traffic||off||Create traffic.txt [large]|
|-runall||runall||off||Report on non-autostart agents|
|-nohdr||nohdr||off||Skip header for regression testing|
|-nodist||nodist||off||warn on sits without distribution|
|n/a||skip||null||Advisory codes to skip in report|
|-sth24||sth24||Off||warn on historical data collection less then 24 hours|
|n/a||soap_timeout||180||Wait for soap|
|-a||a||Off||create sit_atr.txt file|
|-o||o||./sitaudit.csv||Output report file|
|-workpath||workpath||<null>||Directory to store output files|
|-bulk <path>||bulk <path>||<null>||Directory to bulkexportsit files|
|-lst||n/a||<null>||name of lst files|
|-txt||txt <file1> <file2> <fiile3>||<null>||name of txt files|
|n/a||soap||<null>||SOAP access information|
|-std||std||Off||Userid/password in stdin|
|n/a||user||<null>||Userid to access SOAP|
|n/a||passwd||null||Password to access SOAP|
You must specify exactly one of lst or bulk or txt or soap in the ini file or command line parameters.
lst specifies that data will be found in the current directory in the form of QA1*.DB.LST. See following section on how to create.
bulk specifies the path to the directory which holds the tacmd bulkexportsit files. If you specify a full path use forward slashes. If there are no forward slashes in the path the current directory is pre-pended to the supplied name. I expect this to be the most common usage. Use tacmd bulkexportsit -d to include distribution data. The bulk report will not contain all the advisories but will still be very useful.
soap specifies how to access the SOAP process with the name or ip address of the server running the hub TEMS. See next section for a discussion.
soap_timeout controls how long the SOAP process will wait for a response. One of the agent failure modes is to not respond to real time data requests. This default is 180 seconds. It might need to be made longer in some complex environments. A value of 90 seconds resulted in a small number of failures [2 agents] in a test environment with 6000 agents.
The txt option makes use of an IBM internal tool and is not otherwise documented here.
Getting the EIB Data Using -lst Option
Following shows how to get data from the EIB [Enterprise Information dataBase] using the sitaudit.cmd or sitaudit.sh files. Here is an example where the work is being done in the existing default tmp directory for Linux/Unix where the TEPS is running. If the product is not installed in the default directory. set the environment variable. Any directory on the system running TEPS could be used.
a) copy sitaudit.tar to /opt/IBM/ITM/tmp
b) untar -xf sitaudit.tar
c) If not using default install directory configure like this: export CANDLEHOME=/opt/IBM/ITM
d) sh sitaudit.sh
d) The .LST files are created and should be moved to where the Situation Audit will be done
Here is an example where the work is being done in the existing default tmp directory for Windows where the TEPS is running.
b) cd c:\IBM\ITM
c) md tmp
d) cd tmp
e) move the sitaudit.cmd to this directory
f) If not using default install directory configure like this: SET CANDLE_HOME=c:\IBM\ITM
h) The .LST files are created and should be moved to where the Situation Audit will be done
ITM Situation Audit Package soap control
The soap control specifies how to access the SOAP process. For a simple ITM installation using default communication controls, specify the name or ip address of the server running the hub TEMS. If you know the primary hub TEMS a single soap control is least expensive.
If the ITM installation is configured with hot standby or FTO there are two hub TEMS. At any one time one TEMS will have the primary role and the other TEMS will have the backup role. If the TEMS maintenance level is ITM 622 or later, set two soap controls which specify the name or ip address of each hub TEMS server. The TEMS with the primary role will be determined dynamically.
Before ITM 622 you should determine ahead of time which TEMS is running as the primary and set the single soap control appropriately.
Connection processing follows the tacmd login logic. It will first use https protocol on port 3661 and then use http protocol on 1920. If the SOAP server is not present on that ITM process, a virtual index.xml file is retrieved and the port that SOAP is actually using is retrieved and used if it exists.
Various failure cases can occur.
- The target name or IP address may be incorrect.
- Communication outages can block access to the servers.
- The TEMS task may not be running and there is no SOAP process.
- The TEMS may be a remote TEMS which does not run the SOAP process.
- The SOAP process may use an alternate port and firewall rules block access.
The recovery actions for the various errors are pretty clear. If (5) is in effect, consider running the situation audit package on a server which is not affected by firewall rules. Alternatively, always make sure that the hub TEMS is the first process started. If it must be recycled, then stop all other ITM processes first and restart them after the TEMS recycle. See this blog post which shows how to configure a stable SOAP port at the hub TEMS.
If the protocol is specified in the soap control only that protocol will be tried.
When the port number is specified in the soap control, 3661 will force https protocol and 1920 will force http protocol.
The ITM environment can be configured to use alternate internal web server access ports using the HTTP and HTTPS protocol modifiers. For this case you can specify the ports to be used
or if both have been altered
The logic generally follows tacmd login processing. There are two differences: ipv6 is not supported and port following ITM 6.1 style is not included. SOAP::Lite does not support ipv6 at present. ITM 6.1 logic could be added but is relatively rare and was not available for testing.
ITM Situation Audit Debug trace
When the sitaudit.pl program does not produce correct results or stops unexpectedly, you should gather addition documentation. The -debuglevel 300 option will generate an extensive log trace.
If SOAP is used, the -vt or traffic option dumps the http data to a traffic.txt file. This can be extremely large and should be used only on a limited basis.
Factors forcing a TEMS Filtering
The following column functions will force TEMS filtering.
There are other factors that can cause TEMS Filtering such as the Filter Object Too Big case. In that circumstance one or both of the two binary blobs which define the filter are too large to transmit. The situation formula must be reduced in size.
Most formulas can be set up to use indexed attributes to reduce the data sent to the TEMS. See this post: Put Your Situations on a Diet Using Indexed Attributes
*TIME comparison is a no filtering at an agent case and so can have a serious impact on TEMS performance and stability. A *TIME compare to an absolute value shows as a *VALUE test.
*SCAN can usually be replaced with *VALUE. The following two work identically but *VALUE is agent filtered and so works much better,
*SCAN attribute *EQ ‘abc’
*VALUE attribute *EQ ‘*abc*’
For sampled situations you can always increase the sampling interval. In some cases once a day or once a week gets the needed effect,
*PLEASE NOTE* This guidance was created based on recent testing on a ITM 630 FP1 environment and long experience. If you discover something contrary to what I documented, please send me full details so I can perfect this project. Using the standard workload trace is a good way to investigate such cases.
tacmd login ….
tacmd settrace -m <temsname> -p KBB_RAS1 -o ‘error (unit:kpxrpcrq,Entry=”IRA_NCS_Sample” state er)’
Advisories and Parse Errors
From version 1.02000 advisory messages may be seen. See the included ITM Situation Audit.doc Word file to explain the 47 messages. Each advisory has been assigned an impact level. Impact 100 reflects the highest concern – which could mean performance problems or total failure to ever run as expected.
Some of the “syntax” errors are actual problems in the situation definitions. Others might be a problem in the Situation Audit tool itself. If it is not obvious please let the author know.
Please give feedback on this feature.
The -a option to create a sit_atr.txt file
This file is an input to a future project to evaluate Situations and Attribute and Catalog files for errors and disuse.
How to guide an efficiency improvement effort
The TEMS Audit is the best and only way to determine how agents are impacting a TEMS right now. This ITM Situation Audit will be useful for larger scale checks and pre-production efficiency checks.
ITM Situation Audit Limitations
SOAP access via ipv6 does not work because SOAP::Lite does not yet support http6 or https6 protocols.
Parts of the ITM Situation Audit tool was derived from Agent Health Survey. Many thanks and kudos to Jeffrey Kegler and his band of assistants who make this work possible. New advisories are added from time to time as new issues are identified.
Please report back experience and suggestions. If ITM Situation Audit does not work well in your environment, repeat the test and add “-debuglevel 300” and after a retry send the sitaudit.log [compressed] for analysis.
History and Earlier versions
If the current version of the ITM Situation Audit tool does not work, you can try recent published binary object zip files. At the same time please contact me to resolve the issues. If you discover an issue try intermediate levels to isolate where the problem was introduced.
Add advisory for mixed *NEs and *ORs
Add advisories for inconsistent attribute usage
Add -purettl to create shell files to remove *UNTIL/*TTL when more than 15 minutes
Add advisory for pure situations *TTL more than 15 minutes
Add advisory for pure situations with *COUNT test
Update a report example text.