Sitworld: Scrubbing Out Windows Agent Malconfiguration Remotely

Sitworld: Scrubbing Out Windows Agent Malconfiguration Remotely

Cruse_Ship_Control

John Alvord, IBM Corporation

jalvord@us.ibm.com

Draft #1 – 6 Februrary 2019 – Level 1.00000

Follow on twitter

Introduction

Sometimes there is a mistake made in ITM Windows Agent configuration. It is made with good intentions but the result is an ITM Agent which constantly loses connection to the TEMS it is configured to and then reconnects over and over. This prevents normal monitoring operations at that agent. It triggers heavy TEMS activity and can even result in TEMS crashes if enough agents have the same incorrect configuration.

Background

ITM Agents for Windows get most of their control information from the Windows registry. At the same time there is a file KXXENV which can contain environment variables. The install time configuration sets up those two sources of data. For 64 bit agents the default spot where the KXXENV file is located is C:\IBM\ITM\TMAITM6_x64.

The issue arises when you want to alter the communications string. One example would be to disable the internal web server by adding HTTP_SERVER:N. The original communication string from install time configuration values might look like this [copy from a registry entry for Windows OS Agent]

[HKEY_LOCAL_MACHINE\SOFTWARE\Candle\KNT\Ver610\Primary\Environment]

“KDC_FAMILIES”=”IP.PIPE PORT:1918 IP use:n SNA use:n IP.SPIPE use:n IP6 use:n IP6.PIPE use:n IP6.SPIPE use:n”

and the need is to run with this

“KDC_FAMILIES”=”http_server:n IP.PIPE PORT:1918 IP use:n SNA use:n IP.SPIPE use:n IP6 use:n IP6.PIPE use:n IP6.SPIPE use:n”

You can read all about changes to communication strings in a post of Protocol Modifiers. There are enough of them to induce sleep.

Distributed agents use only KDC_FAMILIES to define communication protocol etc. There is another parallel environment variable KDE_TRANSPORT which is used in z/OS TEMS and Agents.

The problem comes when distributed agents are configured with both KDC_FAMILIES and KDE_TRANSPORT. These combination does not place nice together and almost always causes problems. The only exception comes if you can arrange for them to be with identical settings. However that is very difficult since the Windows Registry entry is created during the Windows agent installation and the KXXENV file is also created but can be hand changed.

When both are defined, usually one in Windows Registry and another in the KXXENV, things go horribly wrong. Different parts of ITM at the agent can use one or the other and if they are different that results in communication outages.

Most importantly – don’t do that. Do not set KDC_FAMILIES or KDE_TRANSPORT into the KXXENV file. Don’t even think about it… you will waste weeks of effort suffering the consequences and then weeks of effort undoing that change. You may think have read that as a possible way to go but it is a terrible plan. The two values do not play nice together unless they are identical. Usually they fight like mad and waste everyone’s time. They do not magically merge values, they brawl like ruffians.

Agents are in trouble, what do you do?

First make sure that whatever caused the problem is stopped. In most cases that is a post-install script that files and updates the KXXENV file. So change that script to NOT do that any more.

For the agents in trouble, here is a procedure that was worked out at a site that had 5,500 problem agents. The example is for the Windows OS Agent and you need to adapt it to the agent that needs working on. In this case it was known that KDE_TRANSPORT was present. If KDC_FAMILIES is also present, it needs to be deleted also.

1. Check the KNTENV file on the system to make sure it has the problem.

    tacmd executecommand -m Primary:VA33VTWSFC003B:NT -c “type \IBM\itm\TMAITM6_x64\KNTENV” -l -o -v -e -r

Results :

KDEBE_KEYRING_STASH=C:\IBM\ITM\keyfiles\keyfile.sth

KDEBE_KEY_LABEL=IBM_Tivoli_Monitoring_Certificate

KBB_IGNOREHOSTENVIRONMENT=Y

JAVA_HOME=C:\IBM\ITM\java\java50\jre

KBB_IGNOREHOSTENVIRONMENT=N

KDE_TRANSPORT=HTTP_SERVER:N HTTP_CONSOLE:N EPHEMERAL:Y

GSK_PROTOCOL_SSLV2=OFF

GSK_PROTOCOL_SSLV3=ON

GSK_V3_CIPHER_SPECS=352F0A

KUIEXC000I: Executecommand request was performed successfully. The return value of the command run on the remote systems is 0

2. Run the following command to remove KDE_TRANSPORT from KNTENV :

    tacmd executecommand -m Primary:VA33VTWSFC003B:NT -c “type C:\IBM\ITM\TMAITM6_x64\KNTENV | findstr /v KDE_TRANSPORT= >c:\temp\KNTENV.NEW && copy /Y c:\temp\KNTENV.NEW  C:\IBM\ITM\TMAITM6_x64\KNTENV >NUL” -l -o -v -e -r

3.  Check Results:

    tacmd executecommand -m Primary:VA33VTWSFC003B:NT -c “type \IBM\itm\TMAITM6_x64\KNTENV” -l -o -v -e -r

Results :

KDEBE_KEYRING_STASH=C:\IBM\ITM\keyfiles\keyfile.sth

KDEBE_KEY_LABEL=IBM_Tivoli_Monitoring_Certificate

KBB_IGNOREHOSTENVIRONMENT=Y

JAVA_HOME=C:\IBM\ITM\java\java50\jre

KBB_IGNOREHOSTENVIRONMENT=N

GSK_PROTOCOL_SSLV2=OFF

GSK_PROTOCOL_SSLV3=ON

GSK_V3_CIPHER_SPECS=352F0A

KUIEXC000I: Executecommand request was performed successfully. The return value of the command run on the remote systems is 0

4. Fix KDC_FAMILIES in the registry

$CANDLEHOME/bin/tacmd setagentconnection -n Primary:VA33VTWSFC003B:NT -t NT  -e KDC_FAMILIES=”HTTP_SERVER:N EPHEMERAL:Y @Protocol@”

Check Results:

Validate that the agent restarted

Check kntcma.ini that the override settings are in place

tacmd executecommand -m Primary:VA33VTWSFC003B:NT -c “type \IBM\itm\TMAITM6_x64\kntcma.ini” -l -o -v -e -r

ExitDLL=@EtcPath@\KNTCTRD.DLL

PostProcess=DllRegisterUnregisterServer

[KIN64BIT]

[Override Local Settings]

KDC_FAMILIES=HTTP_SERVER:N EPHEMERAL:Y @Protocol@

CTIRA_HIST_DIR=@LogPath@\History\@CanProd@

KUIEXC000I: Executecommand request was performed successfully. The return value of the command run on the remote systems is 0

5) At last check that HTTP is disabled via a Web browser to the system where the agent is running : http://xx.xx.xx.xx:1920/  where you substitute actual system ip address for xx.xx.xx.xx

If that fails the internal web server is not running.

Real Life Variations

The KXXENV file might have a KDC_FAMILIES also, which also needs to be removed.

The KXXENV KDE_TRANSPORT value might have other protocol modifiers needed such as EPHEMERAL:Y. In that case the tacmd setagentconnection above needs to be corrected.

In one case, the Windows Registry itself had been updated to contain a KDE_TRANSPORT. In that case you need to go back in manually and remove that setting

1) Start MTEMS [Managed Tivoli Enterprise Monitoring Systems]. This process will restart the agent

2) Right click on Agent line

3) Select Advanced

4) Select Edit Variables…

5) Locate the problem added variable KDE_TRANSPORT, select it and click on delete.

Summary

This documents a best practice method of removing a problem configuration case from an ITM Agent running on Windows.

Sitworld: Table of Contents

History and Earlier versions

1.00000

Initial publication

Photo Note:  Cruise Ship Under Construction – Control Room

 

3 thoughts on “Sitworld: Scrubbing Out Windows Agent Malconfiguration Remotely

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: