Sitworld: Auditing TEMS for Improved Performance


By John Alvord, IBM Corporation
jalvord@us.ibm.com

 The following information has been updated and included in the new TEMS Audit distribution which is documented here:

Sitworld: TEMS Audit Process and Tool

Inspiration

 People mostly ignore TEMS performance issues until the TEMS crashes or agents start going wildly online and offline.  Until 2010 this was a constant challenge and issues would linger for months. At the end of 2010, I worked a perfect example. The customer had a single situation that – as it happened – caused 6 AIX servers to run at 95% utilization or higher.  I saw from a diagnostic trace what situation was involved and why. An ad hoc summarization program demonstrated that 93% of the incoming workload came from that one situation – on each of the six AIX systems.

 To help explain this to this customer, I exported the data from the analysis program into a comma separated file. That way the customer could view a spreadsheet representation of the impact. The customer agreed to cut the situation into 3 pieces with the sum of the three situations had the identical effect. The result was that utilization on all 6 AIX systems dropped to 10% or lower… not by 10% but dropped from 95% down to 10%!! This was a case of a Too Big situation – where the WHERE clause was too large to send to the agent.

A month later I took on a long running issue. The summarization  program pointed to one situation. That situation was alerting on SAP documents that were late in being processed. The situation itself was quite reasonable. The customer told me they had 10 SAP servers and it seemed that one was very overloaded and another was somewhat overloaded and eight others were under-loaded. The overloaded SAP servers had thousands of late documents.  When they reconfigured the SAP workload the number of late documents decreased and the associated TEMS load was almost eliminated.

Overview 

These experiments resulted in a published technote in Spring 2011:

Auditing TEMS for High Impact Workloads

However it is now published in a blog post here: Sitworld: TEMS Audit Process and Tool

There have been several years of experimentation and improvement. The technote contains a full description of the process and a Perl program to perform the analysis. Changes introduced over time were in response to specific customer cases or to performance concerns.

 A number of customer and IBM sites use TEMS Audit regularly to identify issues before experiencing a high impact.

 On 20 May 2013 I published version 1.00000. That version calculates the diagnostic log segments on the fly and copies them to a work directory to handle  rare cases where segments are reused. With this level, you could run this periodically using a crontab task [Linux/Unix] or an AT command [Windows] and get a regular report on upcoming issues.

On 27 July I published version 1.05000. See below for a list of improvements. The trace report was added to measure a customer TEMS environment that was accidentally set up to run with maximum tracing continuously.

On 12 August I published version 1.10000 with the Advisory section added.

12 August 2013 improvements

1) Add Advisory section
2) Fix hands off logic with Windows log
3) Add 16meg truncated result advisory

27 July 2013 improvements

1) report on trace lines and trace size per minute
2) Add -inplace to skip copying diagnostic segments when not needed
3) Add count of remote SQL failures
4) Count “Filter object too big” messages in hands off mode and display counts
5) Correct defect when there are a few scattered historical data export messages.

Earlier Improvements

1) Reports on historical exporting from TEMS
2)      Reports on SOAP usage
3)      Reports on historical exporting from Agent
4)      Report on “Too Big” situations
5)    Hands Off operation – operate directly off the active logs directory

 Summary

I suggest everyone get this package and run it regularly. Everyone prefers to run without a crisis. Smooth running is best for everyone.  If you have any interesting ideas on how to extend this work, I am always interested. Add a comment here or send an email.

Sitworld: Table of Contents

 

One thought on “Sitworld: Auditing TEMS for Improved Performance

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: