John Alvord, IBM Corporation
jalvord@us.ibm.com
Inspiration
The second section is all the posts and very short comments.
Top 6 By Importance [My Prejudiced View]
It is common to see a TEMS database [often called EIB] which has problems which cause confusion or sometimes lack of monitoring. This project identifies and documents 50+ advisories which will make things better.
Best Practice TEMS Database Backup and Recovery
The most costly support cases are when a customer does not have a proper backup. One memorable case was after a Storage Access Network device lost power and the most recent backup was over a year ago. I talk to people every day where TSM is used to make copies of the TEMS Databases and that almost every time is insufficient. This post was written jointly by a top L3 engineer and myself. If everyone did this the time to recover would drop substantially.
MS_Offline type situations are extremely weighty and cause problems “at a distance”. For example a recent case with 9545 agents and 22 MS_Offline situations with 5 minute sampling interval has spawned multiple IBM Support interactions. They all come back to this one issue. When Persist>1 is set, the problems are much worse. The blog photo shows a California Condor [VERY LARGE VULTURE] lurking outside a window. Treat MS_Offline type situations as dangerous creatures and you will reduce your risk of injury and pain.
This has been available since 2012. It is a perfect way to examine the dynamic impact of workload [Situations, SOAP, real time data requests,etc] on a TEMS. With that knowledge you can make changes to avoid problem conditions. I have one customer who runs this on every TEMS each weekend and if “advisory messages” are present [noted via a non-zero exit code] sends the report to an analyst for review. The rate of emergency IBM Support meetings has dropped to near zero… at least for this area.
This tool provides a view of agents which are online but possibly non-responsive. Cases like this mean that real time data response is slow and partially missing, situations are not running, historical data is not being recorded. These are things everyone should worry about. This identifies the guard dog that doesn’t bark.
This tool performs a static analysis on all distributed situations and produces report of warning messages. It also reports which situations need TEMS filtering [instead of Agent filtering] which is a prime performance killer. Together with TEMS Audit you can really increase efficiency – reducing the cost of monitoring. This also gets early warning for situations with problems. Surprisingly, 50 of 51,000 situations studied actually had syntax errors – like VALUE instead of *VALUE. Anyway – I expect this to be an important tool over time.
Sitworld All Posts – Most recent first
Sitworld: Eliminating Duplicate Agents | 5/29/2020 | Eliminating Duplicate Agents |
Sitworld: Summarization and Pruning Audit | 3/23/2020 | Summarization and Pruning Audit |
Sitworld: ITM Permanent Configuration Best Practices | 1/17/2020 | ITM Permanent Configuration Best Practices |
Sitworld: Scrubbing Out Windows Agent Malconfiguration Remotely | 2/6/2019 | Scrubbing Out Windows Agent Malconfiguration Remotely |
Sitworld: Agent Diagnostic Log Communications Summary | 8/20/2018 | Agent Diagnostic Log Communications Summary |
Sitworld: Adventures in Communications #1 | 7/2/2018 | Adventures in Communications #1 |
Event History #15 High Results Situation to No Purpose | 5/25/2018 | High Results Situation to No Purpose |
Event History #14 Lodging Problems | 5/21/2018 | Lodging Problems |
Event History #13 Delay Delay Delay | 5/10/2018 | Delay Delay Delay |
Event History #12 High Impact Situations And Much More | 5/1/2018 | High Impact Situations And Much More |
Event History #11 Detailed Attribute differences on first two merged results | 4/27/2018 | Detailed Attribute differences on first two merged results |
Event History #10 lost events because DisplayItem missing or null Atoms | 4/24/2018 | lost events because DisplayItem missing or null Atoms |
Event History #9 Two Open Or Close Events In A Row | 4/22/2018 | Two Open Or Close Events In A Row |
Event History #8 Situation Events Opening And Closing Frequently | 4/21/2018 | Situation Events Opening And Closing Frequently |
Event History #7 Events Created But Not Forwarded | 4/19/2018 | Events Created But Not Forwarded |
Event History #6 Lost events with Multiple Results with same DisplayItem at same TEMS second | 4/17/2018 | Lost events with Multiple Results with same DisplayItem at same TEMS second |
Event History #5 Multiple Results Same DisplayItem Same Second | 4/16/2018 | Multiple Results Same DisplayItem Same Second |
Event History #4 Conflict Between DisplayItem and Attributes | 4/13/2018 | Conflict Between DisplayItem and Attributes |
Event History #3 Lost Events Because DisplayItem has Duplicate Atoms | 4/13/2018 | DisplayItem has Duplicate Atoms |
Event History #2 Duplicate DisplayItems At Same Second | 4/10/2018 | Duplicate DisplayItems At Same Second |
Event History #1 The Situation That Fired Oddly | 4/4/2018 | The Situation that cried Wolf |
Event History Audit | 4/3/2018 | Examine Event History in detail |
Policing the Hatfields and the Mccoys | 6/5/2016 | Advanced Base/Until Sits |
TEMS Audit Tracing Guide Tracing Guide Appendix | 7/7/2017 | TEMS Audit Tracing |
ITM 6 Interface Guide Using KDEB_INTERFACELIST | 6/30/2017 | Document usage of KDEB_INTERFACELIST |
ITM Agent Historical Data Export Survey | 5/4/2017 | Detect historical export issues at agents |
FTO Configuration Audit | 3/9/2017 | Detect FTO configuration issues |
Portal Client [TEP] on Windows Using a Private Java Install | 12/28/2016 | Avoid issues with system Java updates |
TEMS Database Repair | 11/18/2016 | Recover from some broken TEMS database files |
The Encyclopedia of ITM Tracing and Trace Related Controls | 9/19/2016 | Document tracing controls |
ITM2SQL Database Utility | 6/19/2016 | Create TEMS database table report files |
Real Time Detection of Duplicate Agent Names | 3/23/2016 | Duplicate Agent Live Detection |
Portal Client Java Web Start JNLP File Cloner | 3/18/2016 | Create JNLP clone files for different types of TEP users |
TEPSI Interface Guide | 3/18/2016 | Learn about TEPS Interfaces |
Diagnostic Snapshot Utility | 1/4/2016 | Capture diagnostics on the fly |
tacmd logs summary | 12/31/2015 | Summarize tacmd diagnostic logs |
Restore Usability to ITCAM YN Custom Situations | 12/24/2015 | Fix some user custom situation affinities |
TEPS Audit | 9/15/2015 | Report on Potential Duplicate Agent names |
Re-re-re-mem-ember Situation Status Cache Growth Analysis | 8/1/2015 | Identify pure situation w/changing DisplayItems |
Attribute and Catalog Health Survey | 4/19/2015 | Check for missing or mis-used cat/atr files |
ITM Database Health Checker | 3/24/2015 | Check TEMS database for issues |
Suppressing Situation Events By Time Schedule | 3/13/2015 | Simple example of Until with timer schedule |
Alerting on Daylight Savings Time Truants | 2/27/2015 | Situation alert when time differences |
Report on Daylight Savings Time Truants | 2/20/2015 | Report on Daylight Savings Time problems |
Situation Formula with Calculations | 1/28/2015 | How to effectively calculate a formula |
ITM Agent Census Scorecard | 11/24/2014 | Report avoidable TEMA defects |
ITM Protocol Usage and Protocol Modifiers | 10/21/2014 | How to increase SOAP ports and much more |
Agent Workload Audit | 10/08/2014 | What is actually happening at Agents |
Situation Distribution Report | 7/11/2014 | What Situations are running where |
CPAN Library for Perl Projects | 7/11/2014 | Using Perl without changing system |
ITM Virtual Table Termite Control Project | 6/17/2014 | Recover from Performance Issue |
ITM TEMS Health Survey | 6/9/2014 | Verify TEMS central services are working |
The Situation That Cried Wolf | 6/1/2014 | Craft a situation for good practical results |
Statistics After 50,000 Views | 5/19/2014 | Summary to date |
*MIN and *MAX – the Little Column Functions That Couldn’t | 5/15/2014 | Two broken Column function |
A Situation By Any Other Name… | 4/28/2014 | Discovering situation names |
Do It Yourself TEMS Table Display | 4/28/2014 | Do It Yourself – Run SQL |
Running TEMS without SITMON | 4/7/2014 | Recovery when TEMS very broken |
ITM Situation Audit | 3/20/2014 | Compiler or Lint for Situation Formulas |
SOAP Flash Flood | 2/1/2014 | tacmd bulkexportsit -d stresses TEMS |
Sample EIF Listener project | 1/17/2014 | Do It Yourself Event listener |
Situation Limits | 12/31/2013 | Situations have many limits |
Put Your Situations on a Diet Using Indexed Attribute | 12/19/2013 | Performance boost for some Situations |
Sampled Situations and Until Situations | 11/25/2013 | Until Processing expose |
TEMS Audit Process and Tool | 11/16/2013 | Measure Agent stress on TEMS |
Detector/Recycler for ITM Windows OS Agent | 11/2/2013 | Windows OS Agent recycler high CPU |
1997 Kasparov vs. Deep Blue Chess Match | 9/17/2013 | Virtual Table hub Update hidden issue |
ITM Agent Health Survey | 9/6/2013 | Discover unhealthy agents |
Sampled Situation Blinking Like a Neon Light | 9/4/2013 | When situation events auto-close |
Sampling Interval and Time Tests | 8/24/2013 | Sampled situations and time to event |
TEMS Audit Advisory Messages | 8/13/2013 | Included in TEMS Audit Process and Tool |
Situations Caused Domain Name Server Overload | 7/24/2013 | Situation generated emails hurt DNS |
Configuring a Stable SOAP Port | 7/16/2013 | Best Practice when SOAP is vital |
Best Practice TEMS Database Backup and Recovery | 7/12/2013 | If you don’t have a backup plan read this |
Action Command Wars – A New Beginning | 7/9/2013 | Running lots of action commands |
Detecting and Recovering from High Agent CPU Usage | 7/1/2013 | Linux/Unix OS Agent High CPU recover |
An Efficient Design for Starting a Background Process | 6/20/2013 | Elegant hack |
Adding Environmental Data to Action Command Emails | 6/12/2013 | When attributes are not enough |
Situation Managing Other Situations | 6/5/2013 | Situation creates MSL |
Mixed Up Situations | 5/28/2013 | Multiple Attribute Situation issues |
Efficient Situation for Two Missing Processes | 5/22/2013 | Elegant efficiency solution |
Getting a Good Nights Sleep | 5/15/2013 | Creating events to keep operators happy |
Rational Choices for Situation Sampling Intervals | 5/8/2013 | Best Practice Interval choices |
The Derivative Log Pattern | 5/1/2013 | Two stage situation logic |
Super Duper Situations | 4/28/2013 | Understanding _Z_ situations |
MS_Offline – Myth and Reality | 4/17/2013 | Everything about MS_Offlines |
Auditing TEMS for Improved Performance | 4/4/2013 | Included in TEMS Audit Process and Tool |
ITM Silver Blaze – Agent Responsiveness Checker | 3/28/2013 | replace by ITM Agent Health Survey |
ITM TEMS Stress Tester Experiment | 3/20/2013 | ITM Analytics experiment |
Summary
Wonderful World of Situations Table of Contents.