Sitworld: DetectorRecycler for ITM Windows OS Agent

Cat_gargoyle

John Alvord, IBM Corporation

jalvord@us.ibm.com

Follow on twitter

Inspiration

Several months ago in this post, I documented sample situations and action commands which detected high CPU condition on Linux/Unix OS Agents and initiated a recycle of that agent. This is the Windows version of that same function.

The Linux/Unix detector/recycler solution was more complex because of an Agent issue that was corrected in ITM 6.3. Before that maintenance level the attributes which calculate process CPU time always returned zero for the main agent CPU utilization. The parallel attribute in Windows always worked as expected and so a single situation suffices. For completeness a double solution is documented later.

Single Situation solution

IBM_NTAgent_HighCPU_Recycle_W

Formula:

*IF *VALUE NT_Process_64.Process_Name *EQ kntcma *AND

*VALUE NT_Process_64.%_Processor_Time *GT 15

Action Command

cd %CANDLE_HOME%\InstallITM & KinConfg -n -pKNT -Lnul & ( for /f %F in (‘DIR /B /S %CANDLE_HOME%\^| findstr /r /c:.*NT_agntstat.sta /c:.*NT_thresholds.xml /c:psit.*_NT.str’) do del /q %F ) & KinConfg -n -sKNT -Lnul

Here is a detailed explanation of the action command:

cd %CANDLE_HOME%\InstallITM &

==> Set current directory where KinConfig is located. In Windows OS Agent Action commands the installation directory is set in the CANDLE_HOME environment variable.

KinConfg -n -pKNT -Lnul &

==> stop the Windows OS Agent running. Using the -L flag the log is written to the nul file and discarded. The log option is needed so KinConfig will not complete until the Windows OS Agent has actually stopped.

( for /f %F in (‘DIR /B /S %CANDLE_HOME%\^| findstr /r /c:.*NT_agntstat.sta /c:.*NT_thresholds.xml /c:psit.*_NT.str’) do del /q %F ) &

==> Search all files in install directory for certain files that should be erased to avoid certain high CPU conditions. This is the payload which can relieve the high CPU in some cases. You could use it for any other purpose.

KinConfg -n -sKNT -Lnul

==> start the Windows OS Agent running. Using the -L flag the log is written to the nul file and discarded. This is needed so KinConfig will not complete until the Windows OS Agent has actually started.

Double Situation Solution

For completeness, this is parallel to the original scheme – a Detector situation that does the detection without using Windows OS Agent Attribute values and a Recycler situation that performs the recycle based on a marker file. It probably isn’t needed for this purpose but it is an interesting model for this sort of work.

Detector Situation: IBM_NTAgent_HighCPU_Detect_W

Formula:

*IF *VALUE KCA_Agent_Availability_Management_Status.PAS_Agent_Name *EQ ‘Monitoring Agent for Windows OS’ *AND

*SCAN KCA_Agent_Availability_Management_Status.Agent_Version *EQ ’23’

==> The goal is to run only on the ITM 623 level Windows OS Agents.

Action Command

cd C:\IBM\ITM\logs & start /w wmic /output:wmic.out path Win32_PerfFormattedData_PerfProc_Process where (Name = ‘kntcma’ and PercentProcessorTime ^>= 15) get Name /format:list & ( type wmic.out | findstr /c:kntcma && copy /y nul agt_cpu.high || del /q agt_cpu.high )

Here is a detailed explanation of the action command:

cd %CANDLE_HOME%\ogs &

==> Set current directory for where the marker file will be created. In Windows OS Agent Action commands the installation directory is set in the CANDLE_HOME environment variable.

start /w

==> start a new process, wait until it completes and then exit. All following commands operate under this control

wmic /output:wmic.out path Win32_PerfFormattedData_PerfProc_Process where (Name = ‘kntcma’ and PercentProcessorTime ^>= 15) get Name /format:list &

==>  a command to print Windows OS Agent name if it is using more than 15% CPU. The ^ above is an escape character so the “>” will not be treated as an redirection control.

( type wmic.out | findstr /c:kntcma &&

==> type the output and use findstr to see the Windows OS Agent is present. The “type” command is required because the wmic.out is UTF format.  The following && means the next command is run only if the exit code is zero.

copy /y nul agt_cpu.high ||

==> This command is run only if exit code from prior stage is zero. Create the zero length marker file to trigger a recycle.

del /q agt_cpu.high )

==> This command is run only if the exit code from prior stage is non-zero. Delete the marker file since the agent is using below 15% cpu.

Recycler Situation: IBM_NTAgent_HighCPU_Recycle_W

Formula:

*IF *VALUE NT_FILE_TREND.Watch_Directory *EQ ‘C:\IBM\ITM\logs’ *AND

*VALUE NT_FILE_TREND.Watch_File *EQ ‘agt_cpu.high’ *AND

*VALUE NT_FILE_TREND.Current_Size_64 *EQ 0

Action command:

cd %CANDLE_HOME%\InstallITM & KinConfg -n -pKNT -Lnul & ( for /f %F in (‘DIR /B /S %CANDLE_HOME%\^| findstr /r /c:.*NT_agntstat.sta /c:.*NT_thresholds.xml /c:psit.*_NT.str’) do del /q %F ) & del /q %CANDLE_HOME%\logs\agt_cpu.high & KinConfg -n -sKNT -Lnul

Here is a detailed explanation of the action command:

cd %CANDLE_HOME%\InstallITM &

==> Set current directory for where KinConfig is located. ==> In Windows OS Agent Action commands the installation directory is set in the CANDLE_HOME environment variable.

KinConfg -n -pKNT -Lnul &

==> stop the Windows OS Agent running. Using the -L flag the log is written to the nul file and discarded. This is needed so KinConfig will not complete until the Windows OS Agent has actually stopped.

( for /f %F in (‘DIR /B /S %CANDLE_HOME%\^| findstr /r /c:.*NT_agntstat.sta /c:.*NT_thresholds.xml /c:psit.*_NT.str’) do del /q %F ) &

==> Search all files in install directory for certain files that should be erased to avoid certain high CPU conditions. This is the payload which can relieve the high CPU in some cases.

del /q  %CANDLE_HOME%\logs\agt_cpu.high &

==> erase the marker file for high CPU

KinConfg -n -sKNT –Lnul

==> start the Windows OS Agent running. Using the -L flag the log is written to the nul file and discarded. This is needed so KinConfig will not complete until the Windows OS Agent has actually started.

Success Story

Customer implemented this solution and thereby alleviated a small number of cases where Windows OS Agents were using excessive CPU time. Recycling them recovered normal function in most cases. A very small number of cases had Windows systems that needed careful investigation and work.

Credits

Two Citibank people did important work on completing and testing this solution.

Nathan Posey             nathan.posey@citi.com

Ian Plunkett                ian.plunkett@citi.com

Summary

This completes the work started in Spring 2013 on detecting and automatically recycling OS Agents which were consuming high CPU in a large scale ITM environment.

Sitworld: Table of Contents

Note: The cat protector in my house.

 

One thought on “Sitworld: DetectorRecycler for ITM Windows OS Agent

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: