John Alvord, IBM Corporation
jalvord@us.ibm.com
Draft #3 – 6 August 2020 – Level 0.63000
Inspiration
I was working through a case where an Agent kept losing connection with a remote TEMS. Seeing the big picture was very tough, the raw data was scattered here and there through many diagnostic log instances. After spending a day collecting cut/paste notes from diagnostic logs I realized an earlier project Sitworld: Agent Workload Audit had accomplish something vaguely similar but more complex. So I spent a few days cloning that project and writing this communications summary report.
ITM Agent Diagnostic Log Communications Summary Installation
The Agent Diagnostic Log Communications Summary package includes one Perl program logcomm.pl. It is contained in a zip file logcomm.0.63000. The program has been tested in several environments using data from other environments. Windows has had the most intense testing. It was also tested on Linux. Many Perl 5 levels will be usable. Here are the details of the testing environments.
perl -v
This is perl 5, version 26, subversion 1 (v5.26.1) built for MSWin32-x64-multi-thread
2) Perl on Linux on Z
# perl -v
perl -v
This is perl, v5.10.0 built for s390x-linux-thread-multi
Copyright 1987-2007, Larry Wall
Agent Diagnostic Log Communications Summary Configuration
The Agent Diagnostic Log Communications Summary package has controls to match installation requirements but the defaults work well. All controls are in the command line options. Following is a full list of the controls.
The following table shows all options. All command line options except -h and –ini and three debug controls can be entered in the ini file. The command line takes precedence if both are present. In the following table, a blank means the option will not be recognized in the context. All controls are lower case only.
command | default | notes |
-z | off | Log is RKLVLOG from z/OS agent |
-o | logcomm.csv | Report file name |
-h | <null> | Help messages |
-v | off | Messages on console also |
-nohdr | off | Do not print report header files |
-logpath | off | Path to log files |
-pc | off | defined agent product code involved. |
-allinv | off | Use with -pc to generate reports for each diagnostic log collection in separate reports. Will also create a merge.csv of all summary report sections. |
The parameter left over is the log file name specification. That can be a single file or it can be a partial diagnostic file name. For example if a example diagnostic log name is nmp180_lz_klzagent_5421d2ef-01.log the filename specifier is nmp180_lz_klzagent_5421d2ef.
The diagnostic log segments wrap around in a regular pattern. The Agent Workload Audit calculates the correct analysis order. In some cases that order is incorrect and a manual collection mist be created. This usually shows when a values in the report show a negative time value.Agent Workload Audit Usage.
Note: The -z option for z/OS agent logs will be validated later. You are welcome to try it now and if there are issues please contact the author. The basic logic has worked “forever” in TEMS Audit but testing is always an important step.
Agent Diagnostic Log Communications Summary Usage
There are no special configuration options needed for this tool.
z/OS Agent Configuration
This is not tested yet. If you are interested please contact me.
Usage
Make the agent logs directory be the current directory.
1) Run against a specific log file
perl logcomm.pl hpcnvhc1_lz_klzagent_5b6b11e0-01.log
output will be in logcomm.csv
2) Run against a specific agent type
perl logcomm.pl -pc lz
output will be in logcomm_lz.csv
3) Run against all logs recorded in the inventory file – in this case hpcnvhc1_lz_klzagent.inv
perl logcomm.pl -pc lz -allinv
Individual reports will be created and also a merge.csv file which sometimes goes back a year!
Agent Diagnostic Log Communications Summary report
Advisory Message Report – *NOTE* See advisory notes at report end
Impact,Advisory Code,Object,Advisory,
90,COMMAUDIT1001W,COMM,Activity Not in Call count [62]
90,COMMAUDIT1002W,COMM,Invalid Transport Correlator error count [32]
COMMREPORT001: Timeline of TEMS connectivity
LocalTime,Hextime,Line,Advisory/Report,Notes,
20180808115304,Log,Start
20180808115305,REMOTE_odibmp003,ip.spipe:#151.171.86.23[3660],Connecting to TEMS,
20180808120935,REMOTE_odibmp003,ip.spipe:#151.171.86.23[3660],reconnect to TEMS REMOTE_odibmp003 without obvious comm failure after 0/00:16:30,
20180808120935,REMOTE_odibmp003,ip.spipe:#151.171.86.23[3660],Connecting to TEMS,
20180808121105,REMOTE_odibmp003,ip.spipe:#151.171.86.23[3660],reconnect to TEMS REMOTE_odibmp003 without obvious comm failure after 0/00:01:30,
20180808121105,REMOTE_odibmp003,ip.spipe:#151.171.86.23[3660],Connecting to TEMS,
……
COMMREPORT002: Timeline of Communication events
LocalTime,Hextime,Line,Advisory/Report,Notes,
20180808115304,5B6B11E0,18,Log,Start,
20180808115304,5B6B11E0,70,EnvironmentVariables,KDE_TRANSPORT=KDC_FAMILIES=”HTTP_CONSOLE:N HTTP_SERVER:N HTTP:0 ip.spipe port:3660 ip.pipe use:n sna use:n ip use:n ip6.pipe use:n ip6.spipe use:n ip6 use:n HTTP_SERVER:N”,
20180808115304,5B6B11E0,74,EnvironmentVariables,KDEB_INTERFACELIST=”!151.171.33.235″,
20180808115305,5B6B11E1,1149,ANIC,14fe484587be.42.02.97.ab.21.eb.7e.b5: 1,1,5B4B1265,5B4B1265,
20180808115305,5B6B11E1,1167,ANIC,14fe4845886c.42.02.97.ab.21.eb.7e.b5: 1,1,5B4B1265,5B4B1265,
20180808115305,5B6B11E1,1258,OPLOG,Connecting to CMS REMOTE_odibmp003,
20180808115305,5B6B11E1,1261,Communications,Successfully connected to CMS REMOTE_odibmp003 using ip.spipe:#151.171.86.23[3660],
20180808115305,5B6B11E1,1261a,Communications,3660,
20180808115305,5B6B11E1,1603,ANIC,14fe4845badc.42.02.97.ab.21.eb.7e.b5: 1,1,5B4B1265,5B4B1265,
20180808115305,5B6B11E1,1703,ANIC,14fe4845bea2.42.02.97.ab.21.eb.7e.b5: 1,1,5B4B1265,5B4B1265,
…..
COMMAUDIT1002W
Text: Invalid Transport Correlator error count [count]
Tracing: error
+5B6B15BF.0001 e-secs: 0 mtu: 944 KDE1_stc_t: 1DE0004D
Meaning: This is a strong signal of a duplicate agent case.
ITM uses remote procedure calls to do most of communications
and this error means that the partner in the communication process
rejected the attempted communication because the type of communication
did not match. For example a ip.pipe communication was sent
but the partner knew it needed a ip.spipe. It could also be a
conflict between a simple connection and a EPHEMERAL:Y connection
or many other cases.
Recovery plan: Investigate the TEMS the agent connects
to for evidence of duplicate agents – especially this one –
and resolve the issue.
What to do with the Report
It is most important to correlate logged events with agent configuration, network incidents. This report will summarize what happened but will usually raise more questions that it answers, The specific report excerpt above was associated with a case of duplicate agent names. When the agent configurations were changed so each agent had a unique name, as ITM expects, the agent stopped losing connection.
Summary
The Agent Diagnostic Log Communications Summart was presented.
Sitworld: Table of Contents
History and Earlier versions
There is a distribution here https://github.com/jalvo2014/logcomm which maybe be somewhat less tested than the point releases. If the current version of the Agent Diagnostic Log Summary tool does not work, you can try recent published binary object zip files. At the same time please contact me to resolve the issues. If you discover an issue try intermediate levels to isolate where the problem was introduced.
logcomm.0.63000
Handle instanced logs
logcomm.0.62000
Make KDE_TRANSPORT/KDC_FAMILIES check work on Windows
logcomm.0.61000
Add hostname/installer/gskit_level when cinfo.info is available
logcomm.0.60000
Add advisory for different CTIRA_HOSTNAME and CTIRA_SYSTEM_NAME
logcomm.0.59000
Add in KDC_PARTITION checking – rare and usually an error
logcomm.0.58000
Add in ENV checking if the files are present
logcomm.0.57000
Add in system name and some CTIRA variables if present
logcomm.0.56000
Add Default host address to timeline
logcomm.0.55000
Advisory on mixed KDC_FAMILIES and KDE_TRANSPORT
logcomm.0.54000
Capture Port Scanning type messages
Collect data from RPC-Lost messages
Photo Note: Cruise Ship Energy Storage – 2017
One thought on “Sitworld: Agent Diagnostic Log Communications Summary”