Sitworld: ITM TEMS Health Survey

MagictheGuardCat

John Alvord, IBM Corporation

jalvord@us.ibm.com

Draft #1 – 9 June 2014 – Level 0.75000

Follow on twitter

Introduction

I was recently asked about a TEMS and SOAP oriented health check for a TEMS environment. There is a related project  ITM Agent Health Survey which focuses in ITM Agents. The goal for this project is to confirm ITM major functions are working

1) The hub TEMS

2) SOAP to the hub TEMS

3) The remote TEMS

4) The backup hub TEMS [if configured with FTO]

This project provides awareness of major types of outages. Other TEMS issues are not checked.

After testing, this should run periodically via a Linux/Unix crontab task or a Windows AT command. It can be configured to run a user specified command such as sending an email whenever there is an problem condition detected.

ITM TEMS Health Survey Report

Here is a sample TEMS Health report on an ITM environment with 4 TEMS with one offline TEMS. See following for column definitions

temshealth

The Type column explains the context of the failure report.

Type comments
FAIL Initialization failure
HUB Unable to connect to hub TEMS
SOAP SOAP error
TEMS A hub or remote TEMS

The TEMS column is the nodeid of the TEMS. Some types of errors do not have an associated TEMS,

The error code gives more specific details about the failure report.

Error Code Comments
0 Normal condition
100 TEMS responded but slower then time_nominal
101 TEMS experienced timeout
102 TEMS no timeout however no data returned
201 No working connection to primary hub TEMS found
202 Error detected in configuration
203 SOAP failure
999 TEMS is offline

The Elapsed column is a calculation of the time in seconds to retrieve the TEMS UTC time.

The Timestamp column is the TEMS UTC current time in ITM datestamp format. [UTC is Universal Coordinated Time].

The Error Text column records the type of error,

ITM TEMS Health Survey Installation

The TEMS Health Survey package includes one Perl program that uses CPAN modules. The program has been tested in two environments. Windows had the most intense testing. It was also tested on Linux. Many Perl 5 levels and CPAN package levels will likely be usable. Here are the details of the tested environments.

  1. ActiveState Perl in the Windows environment which can be found here. www.activestate.com

perl -v

This is perl 5, version 16, subversion 3 (v5.16.3) built for MSWin32-x86-multi-thread (with 1 registered patch, see perl -V for more detail)

2) Perl on AIX 5.3

# perl -v

This is perl, v5.10.0 built for s390x-linux-thread-multi

CPAN is a collection of free to use packages. In your Perl environment, there may be some installed CPAN modules and TEMS health survey may need more. Here are the modules used.

Getopt::Long              in CPAN Getopt-Long 2.42

SOAP::Lite;               in CPAN Soap-Lite 1.11

HTTP::Message;            in CPAN HTTP-Message  6.06

XML::TreePP;              in CPAN XML-TreePP 0.42

You might discover the need for other CPAN modules as the programs are run for the first time. The programs will likely work at other CPAN module levels but this is what was most recently tested.

The Windows Activestate Perl environment uses the Perl Package Manager to acquire the needed CPAN modules. The Agent Survey technote has an appendix showing usage of that manager program with screen captures.

Please note: In some environments installing CPAN packagest is a major problem. Internet access may not be available or Perl may be a shared resource which you do not have the right to change. Changing such packages could negatively affect other programs.

To manage this case please see the CPAN Library for Perl Projects which has a package which can eliminate changing the installed Perl libraries.

See Appendix 1 for Perl 5.18 testing results.

Package contents

The binary package contains a Perl program itm_tems_health.pl and a model ini file named itm_soap.ini. The binary objects are temshealth.0.75000.

To install the TEMS Health Survey package, unzip the file contents into a convenient directory. On Linux/Unix you will need to use chmod/chusr/chgrp to define the file as executable and set to a usable owner and group. The package also includes a model health.ini file. The soap control is required [see later for discussion]. The userid and password may be supplied in the itm_tems.ini. In this case the health.ini file looks like this

soap <server_name>

user <user>

passwd <password>

The user and password credentials may be supplied from standard input. This improves security by ensuring that no user or password is kept in any permanent disk file. In this case the health.ini file would look like this:

soap <server_name>

std

The std option can also be supplied on the command line -std. In either case, a program must supply the userid and password in this form

-user <userid> -passwd <password>

The program invocation would be something like this

mycreds | perl …

ITM TEMS Health Survey Configuration and Usage

The TEMS Health Survey package has controls to match installation requirements but the defaults work in most cases. Some controls are in the command line options and some are in the health.ini file. Following is a full list of the controls.

The following table shows all options. All command line options except -h and –ini and three debug controls can be entered in the ini file. The command line takes precedence if both are present. In the following table, a blank means the option will not be recognized in the context. All controls are lower case only.

command ini file default notes
-log log ./itm_soap.log Name of log file
-ini ./itm_soap.ini Name of ini file
-debuglevel 90 Control message volume
-debug off Turn on some debug points
review_survey_timeout 60 SOAP timeout delay (seconds)
time_nominal 30 max elapsed to get time from TEMS  else produce warning code.
-h <null> Help messages
-v verbose off Log Messages on console
-o o ./temshealth.csv Output report file name
-std std Off Userid/password in stdin
soap <required> hostname of system running hub TEMS.
-user user <required> Userid to access SOAP
-passwd passwd null Password to access SOAP

Many of the command line entries and ini controls are self explanatory.

soap specifies how to access the SOAP process with the name or ip address of the server running the hub TEMS. See next section for a  full discussion.

review_survey_timeout controls how long the SOAP process will wait for a response. One of the TEMS failure modes is to not respond to real time data requests.  This default is 60 seconds. It might need to be made longer in some complex environments

ITM TEMS Health Survey Package soap control

The soap control specifies how to access the SOAP process. For a simple ITM installation using default communication controls, specify the name or ip address of the server running the hub TEMS. If you know the primary hub TEMS a single soap control is least expensive.

When the ITM installation is configured with hot standby or FTO there are two hub TEMS. At any one time one TEMS will have the primary role and the other TEMS will have the backup role. If the TEMS maintenance level is ITM 622 or later, set two soap controls which specify the name or ip address of each hub TEMS server. The TEMS with the primary role will be determined dynamically.

Before ITM 622 you should determine ahead of time which TEMS is running as the primary and set the single soap control appropriately.

Connection processing follows the tacmd login logic. It will first use https protocol on port 3661 and then use http protocol on 1920. If the SOAP server is not present on that ITM process, a virtual index.xml file is retrieved and the port that SOAP is actually using is retrieved and used if it exists and can be accessed.

Various failure cases can occur.

  1. The target name or IP address may be incorrect.
  2. Communication outages can block access to the servers.
  3. The TEMS task may not be running and there is no SOAP process.
  4. The TEMS may be a remote TEMS which does not run the SOAP process.
  5. The SOAP process may use an alternate port and firewall rules block access.

The recovery actions for the various errors are pretty clear. If (5) is in effect, consider running the survey package on a server which is not affected by firewall rules. Alternatively, always make sure that the hub TEMS is the first process started. If it must be recycled, then stop all other ITM processes first and restart them after the TEMS recycle. See this blog post which shows how to configure a stable SOAP port at the hub TEMS.

If the protocol is specified in the soap control only that protocol will be tried.

soap https://<servername&gt;

When the port number is specified in the soap control, 3661 will force https protocol and 1920 will force http protocol.

soap <servername>:1920

The ITM environment can be configured to use alternate internal web server access ports using the HTTP and HTTPS protocol modifiers. For this case you can specify the ports to be used

soap https://<servername&gt;:4661

or if both have been altered

soap https://<servername&gt;:4661

soap http://<servername&gt;:2920

The logic logically follows the tacmd login processing. There are two differences: ipv6 is not supported and port following ITM 6.1 style is not included. SOAP::Lite does not support ipv6 at present. ITM 6.1 logic could be added but is relatively rare and was not available for testing.

ITM TEMS Health Survey User Specified Command

The TEMS survey tool can be configured to run a command for each possible unhealthy TEMS. This is specified only in the ini file. Here is an example ini file entry

cmd echo Possible problem type[${msg_type}] tems[${msg_tems}] code[${msg_errcode] >>c:\temp\test.log

The command should be appropriate for the platform where it will run. Each command will run one at a time.

These are the available substitutions for the client configure command.

${msg_type}

${msg_tems}

${msg_errcode}

${msg_elap}

${msg_time}

${msg_errtext}

The data comes the node status and some calculations.

msg_type see definitions from the example report Type column

msg_tems is the TEMS nodeid if available.

msg_errcode see definitions from the example report Error Code column

msg_elap is the elapsed time in seconds to get a TEMS UTC time. This is calculated in seconds and so often shows as zero.

msg_time is the TEMS ITM timestamp.

msg_errtext presents added data concerning an error condition.

One interesting possibility is to use the Netcool nco_postmsg command to send an alert directly to Netcool.

ITM TEMS Health Survey Exit Code

The TEMS Health survey program exits with code 1 if there were error cases and 0 otherwise. That can be used like this

perl  itm_tems_health.pl -v  || <command to email or to take recovery actions>

The || means to run the second command only if the first command had a non-zero exit code.

ITM TEMS Health Survey Intensive Debug trace

When the itm_tems_health.pl program does not produce correct results or stops unexpectedly, you should gather addition documentation. The -debuglevel 300 option will generate an extensive log trace. The survey.log will be much larger than normal and thus the survey should be limited.

ITM TEMS Health Survey Limitations

SOAP access via ipv6 does not work because SOAP::Lite does not yet support http6 or https6 protocols.

Summary

The TEMS Health Survey tool was derived from ITM Agent Health Survey.

Sitworld: Table of Contents

Feedback Wanted!!

Please report back experience and suggestions. If  TEMS Health Survey does not work well in your environment, repeat the test and add  “-debuglevel 300” and after a retry send the itm_soap.log [compressed] for analysis.

History and Earlier versions

If the current version of the TEMS Health Survey tool does not work, you can try recent published TEMS Health Survey binary object zip files. At the same time please contact me to resolve the issues.  If you discover an issue try intermediate levels to isolate where the problem was introduced.

temshealth.0.75000

Initial Release

Photo Note: Magic on Guard Duty – Big Sur Summer 2009

Appendix 1 – Perl 5.18

There has been some testing with Windows Perl 5.18.2 from Activestate. It mostly worked but has some anomalies.

1) The SOAP::Lite is not visible in the Perl Package Manager.

It seems one of the post install tests failed and that propagated to the PPM database such that it was not present during my testing.

The workaround was to use the cpan command and do “force install SOAP::Lite”. After that things worked fine.

2) SOAP::Lite use showed an uncommon error message while running even though it ran without problems.

2014-06-08 11:30:33 0 ERROR 0 SOAP Failure –

Can’t locate object method “new” via package “LWP::Protocol::https::Socket” at C:/Perl64-5.18.2/lib/LWP/Protocol/http.pm line 31.

It looks like the logic tried to use a new facility for SSL connections and when that failed it fell back to one that worked. That will be worked on and resolved.

I have done no testing on the just released Perl 5.20 level.

 

One thought on “Sitworld: ITM TEMS Health Survey

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: