Cdb_connect() is blocked on read() call

Hi!

We have a code like:
int socket = ::socket(PF_INET, SOCK_STREAM, 0);
if (socket < 0) { return false; }
status = ::cdb_connect(socket, CDB_DATA_SOCKET,
(struct sockaddr*)&address_, sizeof(struct sockaddr_in));

Sometimes (very rarely) it hangs up, and gdb shows it is blocked on read() call inside cdb_connect():
(gdb) bt
#0 0xb32cb5c1 in read () from /lib/i386-linux-gnu/libpthread.so.0
#1 0xb6d34eb3 in read_fill () from /opt/confd/lib/libconfd.so
#2 0xb6d35052 in confd_do_connect () from /opt/confd/lib/libconfd.so
#3 0xb6d25491 in cdb_connect_name () from /opt/confd/lib/libconfd.so
#4 0xb6d2558b in cdb_connect () from /opt/confd/lib/libconfd.so

Multi threading is not the case here as socket is a local variable and cannot be shared between threads. I see with lsof that the socket exists and connection to confd is established, so I assume it is waiting for reply from confd server during some kind of handshake.

Do you have any clue about that?
Maybe it’s a known issue which was fixed in the recent versions?
Does /confdConfig/capi/connectTimeout influence this?

Confd version: 5.2.4.1-10

Hello,

yes, ConfD 5.2.4 is relatively old.

Do you have any error information in ConfD logs? (confd.log or devel.log)

You should enable trace level for developer log in confd.conf like this (last line):

 <developerLog>
      <enabled>true</enabled>
      <file>
        <enabled>true</enabled>
        <name>./devel.log</name>
      </file>
      <syslog>
        <enabled>false</enabled>
      </syslog>
    </developerLog>

    <developerLogLevel>trace</developerLogLevel>

and the search in devel.log for

devel-cdb connect from

you may also try to use cdb_connect_name to identify your call more easily.

OK, thank you, will try, though it is difficult to reproduce.
We suspect it happens because we quickly do it several times in a row: create socket, do cdb_connect, get some values, do cdb_close, repeat. We’re going to try to preserve the socket between those calls and see if it helps.

Yes, I agree this kind of issues are difficult to troubleshoot. Best would be if you could reproduce it with small example that could be tested with newer version of ConfD (e.g. ConfD basic 6.6) and reported to ConfD developers.