Sitworld: Statistics After 50,000 Views

WoberJade

John Alvord, IBM Corporation

jalvord@us.ibm.com

Follow on twitter

Introduction

The Wonderful World of Situations total blog passed 50,000 views recently. Here are some statistics and pointers to items you might have missed and a list of the ones I consider most important.

The Total views list is biased toward blog posts that have been available longest. The Rate view is biased toward more recent blog posts.

My “Most Important” posts are the ones that have the biggest potential to reduce the cost of customer and IBM Support by eliminating problems or speeding up recovery. I have already seen cases where a blog post allowed a customer to recover without no added support. That was “Running TEMS without Sitmon” and everybody benefited. I expect there are other such cases unadvertised.

Top 10 by total views

Sorted by total Views

Views

Days

Rate

Title
3531

392

9.01

MS_Offline – Myth and Reality
3376

250

13.50

ITM Agent Health Survey
3369

412

8.18

ITM Silver Blaze – Agent Responsiveness Checker
2777

420

6.61

ITM TEMS Stress Tester Experiment
2459

351

7.01

Mixed Up Situations
2016

317

6.36

Detecting and Recovering from High Agent CPU Usage 
1997

405

4.93

Auditing TEMS for Improved Performance 
1876

371

5.06

Rational Choices for Situation Sampling Intervals 
1871

306

6.11

Best Practice TEMS Database Backup and Recovery
1700

263

6.46

Sampling Interval and Time Tests 

Top 10 By View Rate

Sorted by Rate of views per day

Views

Days

Rate

Title

375

16

23.44

A Situation By Any Other Name… 

1081

55

19.65

ITM Situation Audit

516

37

13.95

Running TEMS without SITMON

3376

250

13.50

ITM Agent Health Survey

3531

392

9.01

MS_Offline – Myth and Reality

1122

134

8.37

Situation Limits

3369

412

8.18

ITM Silver Blaze – Agent Responsiveness Checker

129

16

8.06

Do It Yourself TEMS Table Display

2459

351

7.01

Mixed Up Situations   

2777

420

6.61

ITM TEMS Stress Tester Experiment

Top 5 By Importance [My Prejudiced View]

Best Practice TEMS Database Backup and Recovery

The most costly support cases are when a customer does not have a proper backup. One memorable case was after a Storage Access Network device lost power and the most recent backup was over a year ago. I talk to people every day where TSM is used to make copies of the TEMS Databases and that almost every time is insufficient. This post was written jointly by a top L3 engineer and myself. If everyone did this the time to recover would drop substantially.

MS_Offline – Myth and Reality

MS_Offline type situations are extremely weighty and cause problems “at a distance”. For example a recent case with 9545 agents and 22 MS_Offline situations with 5 minute sampling interval has spawned multiple IBM Support interactions. They all come back to this one issue. If the Persist>1 is set, the problems are much worse. The blog photo here shows a California Condor [VERY LARGE VULTURE] lurking outside a window. Treat MS_Offline type situations as dangerous creatures and you will reduce your risk of injury and pain.

TEMS Audit Process and Tool

This has been available for 4+ years. It is a wonderful way to examine the dynamic impact of workload [Situations, SOAP, real time data requests,etc] on a TEMS. With that knowledge you can make changes to avoid problem conditions. I have one customer who runs this on every TEMS each weekend and if “advisory messages” are present [noted via a non-zero exit code] sends the report to an analyst for review. The rate of emergency IBM Support meetings has dropped to near zero… at least for this area.

ITM Agent Health Survey

This tool provides a view of agents which are online but possibly non-responsive. Cases like this mean that real time data response is slow and partially missing, situations are not running, historical data is not being recorded. These are things everyone should worry about. This exposes the guard dog that doesn’t bark.

ITM Situation Audit

This is the most recent project. It performs a static analysis on all distributed situations and produces report of warning messages. It also reports which situations need TEMS filtering [instead of Agent filtering] which is a prime performance killer. Together with TEMS Audit you can really increase efficiency – reducing the cost of monitoring. This also gets early warning for situations with problems. Surprisingly, 50 of 51,000 situations studied actually had syntax errors – like VALUE instead of *VALUE. Anyway – I expect this to be an important tool over time.

Summary

This is a fifteen month review of the blog posts.

Sitworld: Table of Contents

Photo Note: 400 pound Jade piece by Don Wobber

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: