Optimizing IT and Service Management Optimizing IT and Service Management

"Ask The Expert Series Q&A - IBM Tivoli Network Manager(ITNM) High Availability Functional Walk Through - v4.1.1 only"

Posted by on in Optimizing IT and Service Management
  • Font size: Larger Smaller
  • Hits: 88
  • Subscribe to this entry
  • Print

Q1) Will the high availability work on poller level or just domain level?

A1) It’s always at component(e.g. Poller) level that are defined as dependency list for
ncp_virtualdomain process under $NCHOME/etc/precision/CtrlServices.DOMAIN.cfg
For e.g.
insert into services.inTray
[ "-domain" , "$PRECISION_DOMAIN" , "-latency" , "200000", "-debug", "0", "-
messagelevel", "warn"],
[ "ncp_poller(default)", "ICMPonly", "SNMPonly", "ncp_g_event" ],
In above example, if ‘ICMPonly’ poller fails, the failover triggers automatically in the given interval.

Q2) Did failover will affect if we disable any omnibus trigger like deduplication or delete clears?

A2) No, failover shouldn’t have any impact if you disable any of these two triggers, as we always look
for LastOccurrence of ItnmHealthCheck Events. Only noise would be, you would end up with loads of
clear events without deduplication trigger running.

Q3) Will running auto discovery at both primary and backup at the same time create any conflict?

A3) Ideally, you would never run discovery on Backup, in-fact running discovery on Backup isn’t
officially supported. But there are circumstances where customers would like to run discovery on
Backup to keep topology upto date when Primary is unavailable for extended period of time. There are
quite few manual steps involved to run discovery on Backup server – if you need details please contact
kmkodali@us.ibm.com until the feature is officially supported in the product.

Q4) How can we rediscover the device if the device is rebooted?

A4) You can run a partial rediscovery manually OR alternatively, if you want to automate the Partial
discovery based on a event (e.g. cold trap), you can verify if ‘disco’ plug-in is enabled that is available
for ‘ncp_g_event’ process , then follow caveats listed @ https://ibm.biz/BdH2Fg. You can verify plugin
status is done using following command:
$NCHOME/precision/scripts/perl/scripts/ncp_gwplugins.pl -domain <DOMAINNAME> -list

Q5) How can we speed up the transfer of model or otherwise decrease the time to failover and back?

A5) These are two independent questions. First, it’s the affect i.e. when ITNM Core & NCIM database
are connected via WAN, the topology migration from Model to NCIM takes longer (probably hours/
days) from minutes depending on the size of the topology. The second impact would be, the Poller
start-up time would also increase from seconds to minutes (e.g. 20mins or higher). If there are such
restrictions for a given environment, its best you go with Distributed ITNM to reduce topology
migration window – more @ https://ibm.biz/BdxtD9 .
Reducing time between failover and failback is configured under $NCHOME/etc/precision/
VirtualDomainSchema.cfg (the attribute is - m_FailoverTime ), and the default is 300 seconds. We
wouldn’t recommend a lower value for various reasons unless you have SLA to meet up such lowest
frequency monitoring (e.g. 1 min interval etc).

Q6) When deploying a full multi-tiered architecture , which ObjectServers should be used to connect
ITNM HA , on Agg or Collection layer?

A6) ITNM should be pointing to the Aggregation layer as part of the best practices.

Q7) I would like to know a rough estimation on the number of devices that a poll policy can handle?

A7) There aren’t any specific limitations – but one must follow Best Practice in setting up number of
Pollers i.e. Per domain you should setup 3 Pollers (one Admin Poller, one dedicated ICMP Poller and
one SNMP Poller and add more SNMP Pollers if necessary based on the number of monitors etc.) For
more information – look at Monitoring best practice guide @ https://ibm.biz/BdXCY8.
If you need performance metrics collection guide of ITNM, please contact your IBM Account
Management to obtain a copy.

Q8) There are certain cases where the discovery stuck at “Interrogating Devices"phase, I did not
defined any scope, it's just a seed based disco with 5-6 devices.

A8) Its best you follow discovery troubleshooting technote - more @ https://ibm.biz/BdH2Xi. You
would need to check if there aren’t any dying or coring agents (ncp_agent.*) or helpers (ncp_dh_*)
which could halt discovery Process. Last but not least, check ncp_d_helpserv.DOMAIN.log/trace files
located in $NCHOME/log/precision (Helper Server logs) which should list any timeouts caused
discovery agents during discovery even if not running in debug mode.


Comments are not available for public users. Please login first to view / add comments.