John Alvord, IBM Corporation
jalvord@us.ibm.com
Draft #1 – 9 June 2014 – Level 0.75000
Introduction
I was recently asked about a TEMS and SOAP oriented health check for a TEMS environment. There is a related project ITM Agent Health Survey which focuses in ITM Agents. The goal for this project is to confirm ITM major functions are working
1) The hub TEMS
2) SOAP to the hub TEMS
3) The remote TEMS
4) The backup hub TEMS [if configured with FTO]
This project provides awareness of major types of outages. Other TEMS issues are not checked.
After testing, this should run periodically via a Linux/Unix crontab task or a Windows AT command. It can be configured to run a user specified command such as sending an email whenever there is an problem condition detected.
ITM TEMS Health Survey Report
Here is a sample TEMS Health report on an ITM environment with 4 TEMS with one offline TEMS. See following for column definitions
The Type column explains the context of the failure report.
Type | comments |
FAIL | Initialization failure |
HUB | Unable to connect to hub TEMS |
SOAP | SOAP error |
TEMS | A hub or remote TEMS |
The TEMS column is the nodeid of the TEMS. Some types of errors do not have an associated TEMS,
The error code gives more specific details about the failure report.
Error Code | Comments |
0 | Normal condition |
100 | TEMS responded but slower then time_nominal |
101 | TEMS experienced timeout |
102 | TEMS no timeout however no data returned |
201 | No working connection to primary hub TEMS found |
202 | Error detected in configuration |
203 | SOAP failure |
999 | TEMS is offline |
The Elapsed column is a calculation of the time in seconds to retrieve the TEMS UTC time.
The Timestamp column is the TEMS UTC current time in ITM datestamp format. [UTC is Universal Coordinated Time].
The Error Text column records the type of error,
ITM TEMS Health Survey Installation
The TEMS Health Survey package includes one Perl program that uses CPAN modules. The program has been tested in two environments. Windows had the most intense testing. It was also tested on Linux. Many Perl 5 levels and CPAN package levels will likely be usable. Here are the details of the tested environments.
- ActiveState Perl in the Windows environment which can be found here. www.activestate.com
perl -v
This is perl 5, version 16, subversion 3 (v5.16.3) built for MSWin32-x86-multi-thread (with 1 registered patch, see perl -V for more detail)
2) Perl on AIX 5.3
# perl -v
This is perl, v5.10.0 built for s390x-linux-thread-multi
CPAN is a collection of free to use packages. In your Perl environment, there may be some installed CPAN modules and TEMS health survey may need more. Here are the modules used.
Getopt::Long in CPAN Getopt-Long 2.42
SOAP::Lite; in CPAN Soap-Lite 1.11
HTTP::Message; in CPAN HTTP-Message 6.06
XML::TreePP; in CPAN XML-TreePP 0.42
You might discover the need for other CPAN modules as the programs are run for the first time. The programs will likely work at other CPAN module levels but this is what was most recently tested.
The Windows Activestate Perl environment uses the Perl Package Manager to acquire the needed CPAN modules. The Agent Survey technote has an appendix showing usage of that manager program with screen captures.
Please note: In some environments installing CPAN packagest is a major problem. Internet access may not be available or Perl may be a shared resource which you do not have the right to change. Changing such packages could negatively affect other programs.
To manage this case please see the CPAN Library for Perl Projects which has a package which can eliminate changing the installed Perl libraries.
See Appendix 1 for Perl 5.18 testing results.
Package contents
The binary package contains a Perl program itm_tems_health.pl and a model ini file named itm_soap.ini. The binary objects are temshealth.0.75000.
To install the TEMS Health Survey package, unzip the file contents into a convenient directory. On Linux/Unix you will need to use chmod/chusr/chgrp to define the file as executable and set to a usable owner and group. The package also includes a model health.ini file. The soap control is required [see later for discussion]. The userid and password may be supplied in the itm_tems.ini. In this case the health.ini file looks like this
soap <server_name>
user <user>
passwd <password>
The user and password credentials may be supplied from standard input. This improves security by ensuring that no user or password is kept in any permanent disk file. In this case the health.ini file would look like this:
soap <server_name>
std
The std option can also be supplied on the command line -std. In either case, a program must supply the userid and password in this form
-user <userid> -passwd <password>
The program invocation would be something like this
mycreds | perl …
ITM TEMS Health Survey Configuration and Usage
The TEMS Health Survey package has controls to match installation requirements but the defaults work in most cases. Some controls are in the command line options and some are in the health.ini file. Following is a full list of the controls.
The following table shows all options. All command line options except -h and –ini and three debug controls can be entered in the ini file. The command line takes precedence if both are present. In the following table, a blank means the option will not be recognized in the context. All controls are lower case only.
command | ini file | default | notes |
-log | log | ./itm_soap.log | Name of log file |
-ini | ./itm_soap.ini | Name of ini file | |
-debuglevel | 90 | Control message volume | |
-debug | off | Turn on some debug points | |
review_survey_timeout | 60 | SOAP timeout delay (seconds) | |
time_nominal | 30 | max elapsed to get time from TEMS else produce warning code. | |
-h | <null> | Help messages | |
-v | verbose | off | Log Messages on console |
-o | o | ./temshealth.csv | Output report file name |
-std | std | Off | Userid/password in stdin |
soap | <required> | hostname of system running hub TEMS. | |
-user | user | <required> | Userid to access SOAP |
-passwd | passwd | null | Password to access SOAP |
Many of the command line entries and ini controls are self explanatory.
soap specifies how to access the SOAP process with the name or ip address of the server running the hub TEMS. See next section for a full discussion.
review_survey_timeout controls how long the SOAP process will wait for a response. One of the TEMS failure modes is to not respond to real time data requests. This default is 60 seconds. It might need to be made longer in some complex environments
ITM TEMS Health Survey Package soap control
The soap control specifies how to access the SOAP process. For a simple ITM installation using default communication controls, specify the name or ip address of the server running the hub TEMS. If you know the primary hub TEMS a single soap control is least expensive.
When the ITM installation is configured with hot standby or FTO there are two hub TEMS. At any one time one TEMS will have the primary role and the other TEMS will have the backup role. If the TEMS maintenance level is ITM 622 or later, set two soap controls which specify the name or ip address of each hub TEMS server. The TEMS with the primary role will be determined dynamically.
Before ITM 622 you should determine ahead of time which TEMS is running as the primary and set the single soap control appropriately.
Connection processing follows the tacmd login logic. It will first use https protocol on port 3661 and then use http protocol on 1920. If the SOAP server is not present on that ITM process, a virtual index.xml file is retrieved and the port that SOAP is actually using is retrieved and used if it exists and can be accessed.
Various failure cases can occur.
- The target name or IP address may be incorrect.
- Communication outages can block access to the servers.
- The TEMS task may not be running and there is no SOAP process.
- The TEMS may be a remote TEMS which does not run the SOAP process.
- The SOAP process may use an alternate port and firewall rules block access.
The recovery actions for the various errors are pretty clear. If (5) is in effect, consider running the survey package on a server which is not affected by firewall rules. Alternatively, always make sure that the hub TEMS is the first process started. If it must be recycled, then stop all other ITM processes first and restart them after the TEMS recycle. See this blog post which shows how to configure a stable SOAP port at the hub TEMS.
If the protocol is specified in the soap control only that protocol will be tried.
soap https://<servername>
When the port number is specified in the soap control, 3661 will force https protocol and 1920 will force http protocol.
soap <servername>:1920
The ITM environment can be configured to use alternate internal web server access ports using the HTTP and HTTPS protocol modifiers. For this case you can specify the ports to be used
soap https://<servername>:4661
or if both have been altered
soap https://<servername>:4661
soap http://<servername>:2920
The logic logically follows the tacmd login processing. There are two differences: ipv6 is not supported and port following ITM 6.1 style is not included. SOAP::Lite does not support ipv6 at present. ITM 6.1 logic could be added but is relatively rare and was not available for testing.
ITM TEMS Health Survey User Specified Command
The TEMS survey tool can be configured to run a command for each possible unhealthy TEMS. This is specified only in the ini file. Here is an example ini file entry
cmd echo Possible problem type[${msg_type}] tems[${msg_tems}] code[${msg_errcode] >>c:\temp\test.log
The command should be appropriate for the platform where it will run. Each command will run one at a time.
These are the available substitutions for the client configure command.
${msg_type}
${msg_tems}
${msg_errcode}
${msg_elap}
${msg_time}
${msg_errtext}
The data comes the node status and some calculations.
msg_type see definitions from the example report Type column
msg_tems is the TEMS nodeid if available.
msg_errcode see definitions from the example report Error Code column
msg_elap is the elapsed time in seconds to get a TEMS UTC time. This is calculated in seconds and so often shows as zero.
msg_time is the TEMS ITM timestamp.
msg_errtext presents added data concerning an error condition.
One interesting possibility is to use the Netcool nco_postmsg command to send an alert directly to Netcool.
ITM TEMS Health Survey Exit Code
The TEMS Health survey program exits with code 1 if there were error cases and 0 otherwise. That can be used like this
perl itm_tems_health.pl -v || <command to email or to take recovery actions>
The || means to run the second command only if the first command had a non-zero exit code.
ITM TEMS Health Survey Intensive Debug trace
When the itm_tems_health.pl program does not produce correct results or stops unexpectedly, you should gather addition documentation. The -debuglevel 300 option will generate an extensive log trace. The survey.log will be much larger than normal and thus the survey should be limited.
ITM TEMS Health Survey Limitations
SOAP access via ipv6 does not work because SOAP::Lite does not yet support http6 or https6 protocols.
Summary
The TEMS Health Survey tool was derived from ITM Agent Health Survey.
Sitworld: Table of Contents
Feedback Wanted!!
Please report back experience and suggestions. If TEMS Health Survey does not work well in your environment, repeat the test and add “-debuglevel 300” and after a retry send the itm_soap.log [compressed] for analysis.
History and Earlier versions
If the current version of the TEMS Health Survey tool does not work, you can try recent published TEMS Health Survey binary object zip files. At the same time please contact me to resolve the issues. If you discover an issue try intermediate levels to isolate where the problem was introduced.
Initial Release
Photo Note: Magic on Guard Duty – Big Sur Summer 2009
Appendix 1 – Perl 5.18
There has been some testing with Windows Perl 5.18.2 from Activestate. It mostly worked but has some anomalies.
1) The SOAP::Lite is not visible in the Perl Package Manager.
It seems one of the post install tests failed and that propagated to the PPM database such that it was not present during my testing.
The workaround was to use the cpan command and do “force install SOAP::Lite”. After that things worked fine.
2) SOAP::Lite use showed an uncommon error message while running even though it ran without problems.
2014-06-08 11:30:33 0 ERROR 0 SOAP Failure –
Can’t locate object method “new” via package “LWP::Protocol::https::Socket” at C:/Perl64-5.18.2/lib/LWP/Protocol/http.pm line 31.
It looks like the logic tried to use a new facility for SSL connections and when that failed it fell back to one that worked. That will be worked on and resolved.
I have done no testing on the just released Perl 5.20 level.
One thought on “Sitworld: ITM TEMS Health Survey”