ConfD call back registration failed after some days of usage

Hello Team,

We are using ConfD version 7.3.2.

Using ConfD Java APIs in our Java application.

This application of ours register for the ConfD call points.

Currently we are observing an issue after 2+ days of usage, We are seeing below error in devel.log.

23-Feb-2021::19:23:15.601 nbiservice-854db99b8b-lc8ks confd[9]: - ConfD started vsn: 7.3.2
26-Feb-2021::05:14:03.052 nbiservice-854db99b8b-lc8ks confd[9]: - Daemon nbi_service died

After this, our call backs are not working. We have checked application state, its up and running. we also do not see any communication between application and ConfD in the time frame where error had occured.

Could you please help me understand what could be the reason for this?

Regards,
Karthik

Hi Karthik,

You get that “Daemon nbi_service died” when the data provider/external database daemon closed its control socket. I,e, the socket that is used for the “init” and “finish” transaction callbacks.

You should have something like this in your Java code:

        // create new control socket
        Socket ctrlSocket = new Socket("10.9.8.7", Conf.PORT);

        // init and connect control socket
        Dp dp = new Dp("nbi_service", ctrlSocket);

        // register the stats callbacks
        dp.registerAnnotatedCallbacks(new myTrans());
        ...

        dp.registerDone();
        // read input from the control socket
        try {
            while (true) dp.read();
        } catch (Exception e) {
            System.out.println("ConfD terminated");
        }

So if you closed the above socket from your application or for example the TCP connection terminated, you will see that “Daemon nbi_service died” log entry in the developer log when ConfD is made aware that the control socket closed.

If the TCP connection closed for some reason (e.g. a timeout), your Java application should notice that too, and for example re-establish the connection.

Best regards,
Conny

Hello Conny,

Thanks for your elaborate response.

We already have one thread doing this continuous check on DP and also retry the registration in case of failures.

Following is our current code.

public void run() {
    log.info("thread run is called");

    while(isActive) {
        try {
            while(isActive) {
                dp.read();
            }
        } catch (Exception e) {
            log.error("Error while dp.read() [{}]", e.getMessage(),e );
        } finally {
            closeDp();
        }

        setup();
        long rebindAfterDpFailureWaitIntervalMillis = 3000L;
        try {
            Thread.sleep(rebindAfterDpFailureWaitIntervalMillis);
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        }
    }
    if(!isActive) {
        log.error("Error in Dp creation. Exiting dp read");
    }
}

Are we doing anything wrong? Do you suggest any changes in the above code?

Regards,
Karthik

If your closeDp() and setup() functions are doing what they are supposed to be doing you are good. You didn’t include them so not much to check.

Adding missing information of setup() and closeDp() implementation.

public void setup() {
    String host = System.getenv("host_name");
    try {
        Socket socket = new Socket();
        SocketAddress socketAddress = new InetSocketAddress(InetAddress.getByName(host), Conf.PORT);
        socket.connect(socketAddress);
        Maapi maapiObj = new Maapi(socket);
        maapiObj.waitStart(2);

        Socket ctrlSocket = new Socket(host, Conf.PORT);
        this.dp = dataProviderFactory.createDataProvider("nbi_service", ctrlSocket, false, Integer.MAX_VALUE, 6000L, true, true);
        dp.registerAnnotatedCallbacks(new TransactionCallBack(maapiObj));//maapiObj is used for attach and detachment maapi session in during INIT and FINISH phase
        dp.registerAnnotatedCallbacks(new DataCallBack());
        dp.registerDone();
        log.info("Dp register done");

    } catch (IOException | ConfException e) {
        log.error("Exception in Dp register or Maapi creation [{}]", e.getMessage(),e);
        closeDp();
        isActive = false;
    }
}


private void closeDp(){
    log.info("Close Dp socket and maapi session");
    try {
        if(dp != null && !dp.getCtrlSocket().isClosed()) {
            dp.getCtrlSocket().close();
            dp.closeAllWorkerSockets();
            maapiObj.endUserSession();
            maapiObj.getSocket().close();
            log.debug(ConfDConstants.SOCKET_DISCONNECT_MESSAGE);
        }
    } catch (Exception e) {
        log.error("Error in CloseDp. Unable to close socket or maapiSession. Exception is [{}]", e.getMessage());
    }
}

Regards,
Karthik

What does your developer log (preferably with developerLogLevel set to trace) and application log (see log4j2.xml preferably at least “info” level) say?

We have collected following logs.

  1. set the developer log level to trace (under /confdConfig/logs) for the most verbose level of logging:
    trace
  2. start your system
  3. run command: confd --status > status_before.txt
  4. reproduce the issue
  5. run command: confd --status > status_after.txt

And changed our connection monitoring thread to below implementation in run method.

try {
while(isActive) {
if(Objects.isNull(dp.getCtrlSocket()) || dp.getCtrlSocket().isClosed()) {
log.error("socket connection has failed");
}
[dp.read](https://dp.read)();
log.info("DP read is called");
}
} catch (Exception e) {
log.error("Error while [dp.read](https://dp.read)() [{}]", e.getMessage(),e );
} finally {
log.info("Close dp is called");
closeDp();
}
setup();
log.info("dp.read() failed, setup is called again");
long rebindAfterDpFailureWaitIntervalMillis = 3000L;
try {
Thread.sleep(rebindAfterDpFailureWaitIntervalMillis);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
if(!isActive) {
log.error("Error in Dp creation. Exiting dp read");
}

Thread is running fine but not detected any socket close.

Devel log just prints this info.

<DEBUG> 9-Mar-2021::00:04:59.315 nbiservice-76cfb59945-qftwx confd[7]: devel-c new_usess db request daemon id: 0
<ERR> 9-Mar-2021::00:04:59.316 nbiservice-76cfb59945-qftwx confd[7]: devel-c new_usess error {external, ""}
<ERR> 9-Mar-2021::00:04:59.317 nbiservice-76cfb59945-qftwx confd[7]: devel-c set_elem error {external, ""} for callpoint 'data-connectivity-context-subtree' path /tapi-common:context/connectivity-context/connectivity-service{92be41b41079cae3df608433cfab1d20efe4f04fa43ba4237eaa17da26a2b041}/is-exclusive
<ERR> 9-Mar-2021::00:14:26.837 nbiservice-76cfb59945-qftwx confd[7]: devel-c no registration found for callpoint data-connectivity-context-subtree/set_elem of type=external path /tapi-common:context/connectivity-context/connectivity-service{00001b41079cae3df608433cfab1d20efe4f04fa43ba4237eaa17da26a2b041}/is-exclusive

Do you have several threads that read the same control socket? Use one thread per control socket.

We have only one thread doing this. Our observation is, this problem comes after 2 days of longevity (Mostly no requests sent over this channel). Is there any API to know health of DataProvider object?

If the socket was closed by the peer, as you can see in the ConfInternal.java file, the socket.getInputStream().read() will return -1 and an exception will be thrown to your dp.read() function
The Java application API source code is available in $CONFD_DIR/java/jar/conf-api-src-$CONFD_VERSION.jar (use jar xvf conf-api-src-$CONFD_VERSION.jar top inflate).
See further for example here: https://stackoverflow.com/questions/10240694/java-socket-api-how-to-tell-if-a-connection-has-been-closed

dp.read is a blocking call. It doesn’t return untill there is some request on the call back. If there is a request on the call back, we see current dp.read returns and next dp.read starts. What is the behaviour of dp.read if there is no request on the call back for 2+ days?

See my previous answer where I provided a pointer to the ConfD Java API source code and a link with info on how to check a Linux socket for status.
As I wrote, the Java API used from your application use the Linux call socket.getInputStream().read() to read from the socket, and expect the Linux socket to return -1 if the peer, here the ConfD daemon close the socket or if there is for example a TCP socket timeout. As you describe your issue, your Linux socket does not detect any problem with the connection. The problem is the Linux socket, not the ConfD Java API. See for example the link I provided.