How to disconnect a range callpoint when daemon stops

kboyapa1 · August 9, 2021, 4:57am

In our application, two different daemons (daemon-1 and daemon-2) connect with ConfD using range callpoint. We are seeing a issue, when one daemon (Ex:- daemon-2) stops for any reason, the callbacks registered by the daemon-2 still exists in confd. When the daemon comes up, the callbacks are getting registered with different daemon ID. Is there anyway we can de-register (or close the worker and control sockets) the range callbacks with ConfD when the daemon is stopped/restarted?

Note: The callbacks are getting disconnected only when we query the data on this callpoints. (this is due to the queryTimeout exceeded and ConfD consider this daemon is dead and closing the control and worker sockets).

                ############
                ### ConfD ###
               ############


#############              #############
## Daemon-1 ##                ## Daemon-2 ##
#############              #############

Range Callpoints details before daemon restarts

id=mgmt-ip-fixed path=/oc-sys:system/mgmt-ip/state/fixed-addresses/fixed-address
{1} - {1} daemonId=51 daemonName=vcc-ha-mgmt-ip-fixed-1 callbacks=get_next,get_elem
{2} - {2} daemonId=72 daemonName=vcc-ha-mgmt-ip-fixed-2 callbacks=get_next,get_elem

Range Callpoints details during daemon restarts

id=mgmt-ip-fixed path=/oc-sys:system/mgmt-ip/state/fixed-addresses/fixed-address
{1} - {1} daemonId=51 daemonName=vcc-ha-mgmt-ip-fixed-1 callbacks=get_next,get_elem
{2} - {2} daemonId=72 daemonName=vcc-ha-mgmt-ip-fixed-2 callbacks=get_next,get_elem

Range Callpoints details after daemon restarts

id=mgmt-ip-fixed path=/oc-sys:system/mgmt-ip/state/fixed-addresses/fixed-address
{1} - {1} daemonId=51 daemonName=vcc-ha-mgmt-ip-fixed-1 callbacks=get_next,get_elem
{2} - {2} daemonId=78 daemonName=vcc-ha-mgmt-ip-fixed-2 callbacks=get_next,get_elem

mnovak · August 9, 2021, 8:34am

Did you try to set CONFD_DAEMON_FLAG_REG_REPLACE_DISCONNECT flag after you call confd_init_daemon? In both daemons (or at least in first that stops)?

It’s function int confd_set_daemon_flags( struct confd_daemon_ctx *dx, int flags); . See confd_lib_dp.

CONFD_DAEMON_FLAG_REG_REPLACE_DISCONNECT

    By default, if one daemon replaces a callpoint registration made by another daemon, this is only logged, and no
action is taken towards the daemon that has "lost" its registration. This can be useful in some scenarios, e.g. it is 
possible to have an "initial default" daemon providing "null" data for many callpoints, until the actual data 
provider daemons have registered. If a daemon uses the CONFD_DAEMON_FLAG_REG_REPLACE_DISCONNECT 
flag, it will instead be disconnected from ConfD if any of its registrations are replaced by another daemon, and 
can take action as appropriate.

kboyapa1 · August 9, 2021, 3:20pm

I tried your suggestion. It did not help.

The callpoint is still hanging in the ConfD until the daemon reconnects (The daemon id is overwriting).

The other way is, perform a “show” query which trigger the callpoint. since there is no daemon to serve this callpoint, the query gets timedot. And then confd treats that daemon is dead and clearing the callpoint registration. This 'show" query is giving “application error”.

kboyapa1 · September 6, 2021, 5:45am

The issue is resolved after updating tcp_keepalive_intvl, tcp_keepalive_time and tcp_keepalive_probes to lower values.

mnovak · September 20, 2021, 8:01am

Thanks for feeedback