Sitworld: Agent Diagnostic Log Communications Summary

CrusieShipEnergy

John Alvord, IBM Corporation

jalvord@us.ibm.com

Draft #3 – 6 August 2020 – Level 0.63000

Follow on twitter

Inspiration

I was working through a case where an Agent kept losing connection with a remote TEMS. Seeing the big picture was very tough, the raw data was scattered here and there through many diagnostic log instances. After spending a day collecting cut/paste notes from diagnostic logs I realized an earlier project Sitworld: Agent Workload Audit had accomplish something vaguely similar but more complex. So I spent a few days cloning that project and writing this communications summary report.

ITM Agent Diagnostic Log Communications Summary Installation

The Agent Diagnostic Log Communications Summary package includes one Perl program logcomm.pl. It is contained in a zip file logcomm.0.63000. The program has been tested in several environments using data from other environments. Windows has had the most intense testing. It was also tested on Linux. Many Perl 5 levels will be usable. Here are the details of the testing environments.

  1. Strawberry Perl 26.1

perl -v

This is perl 5, version 26, subversion 1 (v5.26.1) built for MSWin32-x64-multi-thread

2) Perl on Linux on Z

# perl -v

perl -v

This is perl, v5.10.0 built for s390x-linux-thread-multi

Copyright 1987-2007, Larry Wall

Agent Diagnostic Log Communications Summary Configuration

The Agent Diagnostic Log Communications Summary package has controls to match installation requirements but the defaults work well. All controls are in the command line options. Following is a full list of the controls.

The following table shows all options. All command line options except -h and –ini and three debug controls can be entered in the ini file. The command line takes precedence if both are present. In the following table, a blank means the option will not be recognized in the context. All controls are lower case only.

command default notes
-z off Log is RKLVLOG from z/OS agent
-o logcomm.csv Report file name
-h <null> Help messages
-v off Messages on console also
-nohdr off Do not print report header files
-logpath off Path to log files
-pc off defined agent product code involved.
-allinv off Use with -pc to generate reports for each  diagnostic log collection in separate reports. Will also create a merge.csv of all summary report sections.

The parameter left over is the log file name specification. That can be a single file  or it can be a partial diagnostic file name. For example if a example diagnostic log name is nmp180_lz_klzagent_5421d2ef-01.log the filename specifier is nmp180_lz_klzagent_5421d2ef.

The diagnostic log segments wrap around in a regular pattern. The Agent Workload Audit calculates the correct analysis order. In some cases that order is incorrect and a manual collection mist be created. This usually shows when a values in the report show a negative time value.Agent Workload Audit Usage.

Note: The -z option for z/OS agent logs will be validated later. You are welcome to try it now and if there are issues please contact the author. The basic logic has worked “forever” in TEMS Audit but testing is always an important step.

Agent Diagnostic Log Communications Summary Usage

There are no special configuration options needed for this tool.

z/OS Agent Configuration

This is not tested yet. If you are interested please contact me.

Usage

Make the agent logs directory be the current directory.

1) Run against a specific log file

perl logcomm.pl hpcnvhc1_lz_klzagent_5b6b11e0-01.log

output will be in logcomm.csv

2) Run against a specific agent type

perl logcomm.pl -pc lz

output will be in logcomm_lz.csv

3) Run against all logs recorded in the inventory file – in this case  hpcnvhc1_lz_klzagent.inv

perl logcomm.pl -pc lz -allinv

Individual reports will be created and also a merge.csv file which sometimes goes back a year!

Agent Diagnostic Log Communications Summary report

Advisory Message Report – *NOTE* See advisory notes at report end

Impact,Advisory Code,Object,Advisory,

90,COMMAUDIT1001W,COMM,Activity Not in Call count [62]

90,COMMAUDIT1002W,COMM,Invalid Transport Correlator error count [32]

COMMREPORT001: Timeline of TEMS connectivity

LocalTime,Hextime,Line,Advisory/Report,Notes,

20180808115304,Log,Start

20180808115305,REMOTE_odibmp003,ip.spipe:#151.171.86.23[3660],Connecting to TEMS,

20180808120935,REMOTE_odibmp003,ip.spipe:#151.171.86.23[3660],reconnect to TEMS REMOTE_odibmp003 without obvious comm failure after 0/00:16:30,

20180808120935,REMOTE_odibmp003,ip.spipe:#151.171.86.23[3660],Connecting to TEMS,

20180808121105,REMOTE_odibmp003,ip.spipe:#151.171.86.23[3660],reconnect to TEMS REMOTE_odibmp003 without obvious comm failure after 0/00:01:30,

20180808121105,REMOTE_odibmp003,ip.spipe:#151.171.86.23[3660],Connecting to TEMS,

……

COMMREPORT002: Timeline of Communication events

LocalTime,Hextime,Line,Advisory/Report,Notes,

20180808115304,5B6B11E0,18,Log,Start,

20180808115304,5B6B11E0,70,EnvironmentVariables,KDE_TRANSPORT=KDC_FAMILIES=”HTTP_CONSOLE:N HTTP_SERVER:N HTTP:0 ip.spipe port:3660 ip.pipe use:n sna use:n ip use:n ip6.pipe use:n ip6.spipe use:n ip6 use:n HTTP_SERVER:N”,

20180808115304,5B6B11E0,74,EnvironmentVariables,KDEB_INTERFACELIST=”!151.171.33.235″,

20180808115305,5B6B11E1,1149,ANIC,14fe484587be.42.02.97.ab.21.eb.7e.b5: 1,1,5B4B1265,5B4B1265,

20180808115305,5B6B11E1,1167,ANIC,14fe4845886c.42.02.97.ab.21.eb.7e.b5: 1,1,5B4B1265,5B4B1265,

20180808115305,5B6B11E1,1258,OPLOG,Connecting to CMS REMOTE_odibmp003,

20180808115305,5B6B11E1,1261,Communications,Successfully connected to CMS REMOTE_odibmp003 using ip.spipe:#151.171.86.23[3660],

20180808115305,5B6B11E1,1261a,Communications,3660,

20180808115305,5B6B11E1,1603,ANIC,14fe4845badc.42.02.97.ab.21.eb.7e.b5: 1,1,5B4B1265,5B4B1265,

20180808115305,5B6B11E1,1703,ANIC,14fe4845bea2.42.02.97.ab.21.eb.7e.b5: 1,1,5B4B1265,5B4B1265,

…..

COMMAUDIT1002W

Text: Invalid Transport Correlator error count [count]

Tracing: error

+5B6B15BF.0001     e-secs: 0                  mtu: 944         KDE1_stc_t: 1DE0004D

Meaning: This is a strong signal of a duplicate agent case.

ITM uses remote procedure calls to do most of communications

and this error means that the partner in the communication process

rejected the attempted communication because the type of communication

did not match. For example a ip.pipe communication was sent

but the partner knew it needed a ip.spipe. It could also be a

conflict between a simple connection and a EPHEMERAL:Y connection

or many other cases.

Recovery plan: Investigate the TEMS the agent connects

to for evidence of duplicate agents – especially this one –

and resolve the issue.

What to do with the Report

It is most important to correlate logged events with agent configuration, network incidents. This report will summarize what happened but will usually raise more questions that it answers, The specific report excerpt above was associated with a case of duplicate agent names. When the agent configurations were changed so each agent had a unique name, as ITM expects, the agent stopped losing connection.

Summary

The Agent Diagnostic Log Communications Summart was presented.

Sitworld: Table of Contents

History and Earlier versions

There is a distribution here https://github.com/jalvo2014/logcomm which maybe be somewhat less tested than the point releases. If the current version of the Agent Diagnostic Log Summary tool does not work, you can try recent published binary object zip files. At the same time please contact me to resolve the issues.  If you discover an issue try intermediate levels to isolate where the problem was introduced.

logcomm.0.63000
Handle instanced logs

logcomm.0.62000
Make KDE_TRANSPORT/KDC_FAMILIES check work on Windows

logcomm.0.61000
Add hostname/installer/gskit_level when cinfo.info is available

logcomm.0.60000
Add advisory for different CTIRA_HOSTNAME and CTIRA_SYSTEM_NAME

logcomm.0.59000
Add in KDC_PARTITION checking – rare and usually an error

logcomm.0.58000
Add in ENV checking if the files are present

logcomm.0.57000
Add in system name and some CTIRA variables if present

logcomm.0.56000
Add Default host address to timeline

logcomm.0.55000
Advisory on mixed KDC_FAMILIES and KDE_TRANSPORT

logcomm.0.54000
Capture Port Scanning type messages

logcomm.0.53100

Collect data from RPC-Lost messages

Photo Note: Cruise Ship Energy Storage – 2017

 

One thought on “Sitworld: Agent Diagnostic Log Communications Summary

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: