John Alvord, IBM Corporation
Draft #1 – 4 May – Level 0.51000
The ITM Agent Historical Data Survey tool reports cases where the historical data export process has experienced a failure.The data is available for cases when the OS Agent on the system is at the ITM 630 maintenance level.
ITM agents can collect historical data. At a user configured time rate, the data for specific attributes is collected in Short Term Historical [STH] files. For most environments best practice is to collect the STH files at the agent. Periodically, the data is exported to a Warehouse Proxy Agent which in turn relays this the data to a data warehouse like DB2 or Oracle. This generally works quite reliably. However there are a large collection of potential failure cases. The agent may not be able to connect to the WPA. The local file system may fill. The STH file might be broken after an unexpected system stoppage. There are 154 error codes and just one indicates success.
One recent project Discovering Historical Data Export Problems at Agent showed how to create an situation to alert on problem cases. That is quite useful in maintaining a stable environment. However when starting out to clear all issues, all the alerts can be inefficient. Much better would be a report on the problem cases,At ITM 630 access was provided from the Agent Support Library or TEMA.
The following project presents a historical data export report for all the agents using a ITM 630 TEMA.
It is based on the Agent Health Survey project which identifies potential ITM agents which are unhealthy – appearing online but unable to provide real time data or even run situations. That is used in many large ITM installations. The basic framework was reused in this project.
ITM Agent Historical Data Export Survey Report
Here is an example of an test on a fairly large set of Windows OS Agents.
A review of the agents found there was some issue getting the data – the diagnostic log showed errors. As a result on these OS Agents, there was no NTPROCESS data. When the export process found no data, it recorded a Metafile not found. Normally there would be a NTPROCESS.hdr file and a NTPROCESS file. Neither were present and so the error code was set. This was only a 20+ of 5000 Windows agents but achieving 100% data capture is an excellent goal.
This post Sitworld: Discovering Historical Data Export Problems at Agent includes a list of all the error codes. Involve IBM Support to determine the meaning and how to recover from a specific error.
ITM Agent Historical Data Export Survey Installation
The agent historical data export survey package includes one Perl program that uses CPAN modules. The program has been tested in several environments. Window had the most intense testing. It was also tested on AIX. Many Perl 5 levels and CPAN package levels will be usable. Here are the details of the testing environments.
The Activestate Perl used is 5.20. If you make use of the blog CPAN library below, use the 5.20 version of that package.
- ActiveState Perl in the Windows environment which can be found here: http://www.activestate.com/activeperl/downloads
This is perl 5, version 20, subversion 1 (v5.20.1) built for MSWin32-x64-multi-thread (with 1 registered patch, see perl -V for more detail)
2) Perl on AIX 5.3
# perl -v
This is perl, v5.8.2 built for aix-thread-multi
(with 3 registered patches, see perl -V for more detail)
CPAN is a collection of free to use packages. In your Perl environment, there may be some installed CPAN modules and agent health survey may need more. Here are the modules used.
Getopt::Long in CPAN Getopt-Long 2.42
LWP::UserAgent in libwww-Perl 6.02
HTTP::Request::Common in CPAN HTTP-Message 6.06
XML::TreePP; in CPAN XML-TreePP 0.43
You might discover the need for other CPAN modules as the programs are run for the first time. The programs will likely work at other CPAN module levels but this is what was most recently tested.
The Windows Activestate Perl environment uses the Perl Package Manager to acquire the needed CPAN modules. The Agent Survey technote has an appendix showing usage of that manager program with screen captures.
Please note!!: In some environments installing new CPAN packages is a major problem. Internet access may not be available or Perl may be a shared resource which you do not have the right to change. Changing such packages could negatively affect other programs.
To manage this case please see the CPAN Library for Perl Projects which has a package which can eliminate changing the installed Perl libraries.
The supplied program is itm_sth_survey.pl and a model sthsurvey.ini file in a zip file itm_sth_survey.0.51000.
To install this package, unzip or untar the file contents into a convenient directory. The soap control is required [see later for discussion]. In this case the sthsurvey.ini file looks like this
The user and password credentials may be supplied from standard input. This increases security by ensuring that no user or password is kept in any permanent disk file. In this case the health.ini file would look like this:
The std option can also be supplied on the command line -std. In either case, a program must supply the userid and password in this form
-user <userid> -passwd <password>
The program invocation would be something like this
mycreds | perl …
ITM Agent Historical Data Export Survey Configuration and Usage
The Agent Historical Data Export Survey package has controls to match installation requirements but the defaults work in most cases. Some controls are in the command line options and some are in the health.ini file. Following is a full list of the controls.
The following table shows all options. All command line options except -h and –ini and three debug controls can be entered in the ini file. The command line takes precedence if both are present. In the following table, a blank means the option will not be recognized in the context. All controls are lower case only.
|-log||log||./sthsurvey.log||Name of log file|
|-ini||./sthsurvey.ini||Name of ini file|
|-debuglevel||90||Control message volume|
|-debug||off||Turn on some debug points|
|-dpr||off||Dump internal data arrays|
|-v||verbose||off||Messages on console also|
|-vt||traffic||off||Create traffic.txt [large]|
|-pc||pc||<null>||Limit survey by agent types|
|-tems||tems||<null>||Limit survey by TEMSes|
|-agent||agent||<null>||Agents to survey|
|-agent_list||agent_list||<null>||text file with agents to survey|
|-ignore_list||ignore_list||<null>||text file with agents to ignore|
|-all||all||off||Produce report of all agents|
|-agent_timeout||agent_timeout||50||TEMS to Agent wait|
|n/a||soap_timeout||180||Wait for soap|
|-o||o||./sthsurvey.csv||Output report file|
|-workpath||workpath||<null>||Directory to store output files|
|n/a||soap||<required>||SOAP access information|
|n/a||soapurl||<null>||Recognized – use soap|
|-std||std||Off||Userid/password in stdin|
|-user||user||<required>||Userid to access SOAP|
|-passwd||passwd||null||Password to access SOAP|
Many of the command line entries and ini controls are self explanatory. The following options can be set multiple times: -pc and -tems and -soap. All time base settings are in seconds.
soap specifies how to access the SOAP process with the name or ip address of the server running the hub TEMS. See next section for a discussion.
soapurl specifies how to access the SOAP process including the protocol and port number and target.
soap_timeout controls how long the SOAP process will wait for a response. One of the agent failure modes is to not respond to real time data requests. This default is 180 seconds. It might need to be made longer in some complex environments. A value of 90 seconds resulted in a small number of failures [2 agents] in a test environment with 6000 agents.
-agent specifies specific agents to survey and can be set multiple times. -agent_list gives a filename which contains agents to survey. If both are present in command and/or ini file the effect is cumulative. If -agent or -agent_list is used, you usually do NOT want to use -tems or -pc since those will eliminate some of the specified agents.
If the -agent_list has an entry which begins with a circumflex ^ [shift 6] the entry is considered a regular expression. The ^ character is the beginning of line anchor. If you specify ^abc the the managed systems which begin with “abc” will be considered of interest. If you wanted Linux OS Agents which began with abc you would use ^abc.*:LZ. That allows you to create a report on agents of interest based just on the name.
Controls to include [like -pc and -tems] and exclude [like -ignore_list] will operate independently. It is best to minimize the number of controls and test thoroughly so you can avoid surprising results.
Command lines supplied are printed in the report, however the -user and -password values are replaced by UUUUUUUU and PPPPPPPP.
ITM Agent Historical Data Export Survey Package soap control
The soap control specifies how to access the SOAP process. For a simple ITM installation using default communication controls, specify the name or ip address of the server running the hub TEMS. If you know the primary hub TEMS a single soap control is least expensive.
If the ITM installation is configured with hot standby or FTO there are two hub TEMS. At any one time one TEMS will have the primary role and the other TEMS will have the backup role. If the TEMS maintenance level is ITM 622 or later, set two soap controls which specify the name or ip address of each hub TEMS server. The TEMS with the primary role will be determined dynamically.
Before ITM 622 you should determine ahead of time which TEMS is running as the primary and set the single soap control appropriately.
Connection processing follows the tacmd login logic. It will first use https protocol on port 3661 and then use http protocol on 1920. If the SOAP server is not present on that ITM process, a virtual index.xml file is retrieved and the port that SOAP is actually using is retrieved and used if it exists.
Various failure cases can occur.
- The target name or IP address may be incorrect.
- Communication outages can block access to the servers.
- The TEMS task may not be running and there is no SOAP process.
- The TEMS may be a remote TEMS which does not run the SOAP process.
- The SOAP process may use an alternate port and firewall rules block access.
The recovery actions for the various errors are pretty clear. If (5) is in effect, consider running the survey package on a server which is not affected by firewall rules. Alternatively, always make sure that the hub TEMS is the first process started. If it must be recycled, then stop all other ITM processes first and restart them after the TEMS recycle. See this blog post which shows how to configure a stable SOAP port at the hub TEMS.
If the protocol is specified in the soap control only that protocol will be tried.
When the port number is specified in the soap control, 3661 will force https protocol and 1920 will force http protocol.
The ITM environment can be configured to use alternate internal web server access ports using the HTTP and HTTPS protocol modifiers. For this case you can specify the ports to be used
or if both have been altered
The logic generally follows tacmd login processing. There are two differences: ipv6 is not supported and port following ITM 6.1 style is not included. SOAP::Lite does not support ipv6 at present. ITM 6.1 logic could be added but is relatively rare and was not available for testing.
ITM Agent Historical Data Export Survey Install Validation Test
Start with a short run. The goals here are
- Ensure Perl is installed with the needed CPAN packages
- Validate SOAP communication controls
- Access and review of the hub TEMS tables
- Access and review of agent operations logs.
- Clear observed problems
Here is an example command
perl itm_sth_survey.pl -v -tems <tems_name> -pc ux
The -v option writes all the log messages to the screen. The -tems option specifies a tems where agents report to. The -pc option says what agents to study. Later on you can specify multiple -tems and -pc options.
Here is a second example command where the externally supplied CPAN modules have been installed in the directory inc. In addition all the output files are written into the /tmp directory.
perl -Iinc itm_health_survey.pl -v -tems <tems_name> -pc ux -workpath /tmp
ITM Agent Health Survey Intensive Debug trace
When the itm_survey.pl program does not produce correct results or stops unexpectedly, you should gather addition documentation. The -debuglevel 300 option will generate an extensive log trace. The survey.log will be much larger than normal and thus the survey should be limited.
The -vt or traffic option dumps the http data to a traffic.txt file. This can be extremely large and should be used only on a limited basis. In one case a 10,000 agent survey generated a 2 gigabyte file.
ITM Agent Historical Data Export Survey Limitations
http6 and http6s protocols are not yet supported.
The Agent Historical Data Export Survey tool was derived from Agent Health Survey.
Please report back experience and suggestions. If Agent Health Survey does not work well in your environment, repeat the test and add “-debuglevel 300” and after a retry send the health.log [compressed] for analysis.
History and Earlier versions
If the current version of the Agent Health Survey tool does not work, you can try recent published Health Survey binary object zip files. At the same time please contact me to resolve the issues. If you discover an issue try intermediate levels to isolate where the problem was introduced.
Photo Note: Sun Pillar during sunset off Carmel Highlands [ref]. Credit to my neighbor Stephen Adair who took the photograph.