This application of ours register for the ConfD call points.
Currently we are observing an issue after 2+ days of usage, We are seeing below error in devel.log.
23-Feb-2021::19:23:15.601 nbiservice-854db99b8b-lc8ks confd[9]: - ConfD started vsn: 7.3.2
26-Feb-2021::05:14:03.052 nbiservice-854db99b8b-lc8ks confd[9]: - Daemon nbi_service died
After this, our call backs are not working. We have checked application state, its up and running. we also do not see any communication between application and ConfD in the time frame where error had occured.
Could you please help me understand what could be the reason for this?
You get that “Daemon nbi_service died” when the data provider/external database daemon closed its control socket. I,e, the socket that is used for the “init” and “finish” transaction callbacks.
You should have something like this in your Java code:
// create new control socket
Socket ctrlSocket = new Socket("10.9.8.7", Conf.PORT);
// init and connect control socket
Dp dp = new Dp("nbi_service", ctrlSocket);
// register the stats callbacks
dp.registerAnnotatedCallbacks(new myTrans());
...
dp.registerDone();
// read input from the control socket
try {
while (true) dp.read();
} catch (Exception e) {
System.out.println("ConfD terminated");
}
So if you closed the above socket from your application or for example the TCP connection terminated, you will see that “Daemon nbi_service died” log entry in the developer log when ConfD is made aware that the control socket closed.
If the TCP connection closed for some reason (e.g. a timeout), your Java application should notice that too, and for example re-establish the connection.
What does your developer log (preferably with developerLogLevel set to trace) and application log (see log4j2.xml preferably at least “info” level) say?
We have only one thread doing this. Our observation is, this problem comes after 2 days of longevity (Mostly no requests sent over this channel). Is there any API to know health of DataProvider object?
If the socket was closed by the peer, as you can see in the ConfInternal.java file, the socket.getInputStream().read() will return -1 and an exception will be thrown to your dp.read() function
The Java application API source code is available in $CONFD_DIR/java/jar/conf-api-src-$CONFD_VERSION.jar (use jar xvf conf-api-src-$CONFD_VERSION.jar top inflate).
See further for example here: https://stackoverflow.com/questions/10240694/java-socket-api-how-to-tell-if-a-connection-has-been-closed
dp.read is a blocking call. It doesn’t return untill there is some request on the call back. If there is a request on the call back, we see current dp.read returns and next dp.read starts. What is the behaviour of dp.read if there is no request on the call back for 2+ days?
See my previous answer where I provided a pointer to the ConfD Java API source code and a link with info on how to check a Linux socket for status.
As I wrote, the Java API used from your application use the Linux call socket.getInputStream().read() to read from the socket, and expect the Linux socket to return -1 if the peer, here the ConfD daemon close the socket or if there is for example a TCP socket timeout. As you describe your issue, your Linux socket does not detect any problem with the connection. The problem is the Linux socket, not the ConfD Java API. See for example the link I provided.