We have recently upgraded from ConfD-22.214.171.124 to ConfD-6.4.2. We have a subscriber program that tries to connect to the running CDB to read a piece of configuration data. The sequence it uses is this:
if (CONFD_OK != cdb_connect(m_socket(), CDB_DATA_SOCKET, &m_confdAddr, sizeof(m_confdAddr)))
printf("Failed to cdb_connect, confd_errno=%d, %s\n", confd_errno, confd_lasterr());
if (CONFD_OK != cdb_start_session2(m_socket(), CDB_RUNNING, 0))
printf("Failed to cdb_start_user_session, confd_errno=%d, %s\n", confd_errno, confd_lasterr());
if (CONFD_OK != cdb_set_namespace(m_socket(), hashedNamespace))
printf("Failed to cdb_set_namespace, confd_errno=%d, %s\n", confd_errno, confd_lasterr());
The call to cdb_connect() succeeds. but when cdb_start_session2() is called, the program segfaults with a traceback like this:
(gdb) bt -10
#524117 0x000000000042af33 in strncpy ()
#524118 0x000000000042af33 in strncpy ()
#524119 0x00007ffff7ecbd2a in erl_eatom_init_latin1 () from /opt/confd/lib/libconfd.so
#524120 0x00007ffff7ecc088 in erl_mk_eatom () from /opt/confd/lib/libconfd.so
#524121 0x00007ffff7eaa00f in cdb_start_session2 () from /opt/confd/lib/libconfd.so
I’m sure it’s not really the cdb_start_session2() function which is segfaulting, but what could be causing this at this particular point? Perhaps there’s some subtle change that we missed in the upgrade from 126.96.36.199 to 6.4.2? I’ve tried many things with no joy so far…
Uhhh… - what do you have in frames 0-524116??? With such a huge stack, I wouldn’t be surprised if the segfault is due to exceeding the maximum stack size. But in that case of course the question remains why that is happening…
As far as I can tell, all the rest of those are duplicates of #524117, the strncpy() call. One time, I just typed “bt” to get a backtrace, and it gave me that line over and over until I aborted after 800 of them, or so.
That’s bizarre… - if it was “the truth”, it would imply that strncpy() is recursively calling itself over and over, and although I can with some effort imagine a recursive implementation, I can’t believe that any C library has one. Also, while any crashes in string manipulation makes me suspect non-NUL-terminated “runaway” strings, all the calls in cdb_start_session2() that will end up in strncpy() pass string literals which obviously won’t have this problem, e.g. erl_mk_atom("lock_session");, with
(the call to erl_mk_atom() is apparently optimized away). I have come across cases where gdb just doesn’t seem to be able to provide any sensible information from a core dump, but your stack trace looks fine besides the gazillion strncpy()s. Anyway, you should have the libconfd sources available in confd-6.4.2.libconfd.tar.gz, I would suggest that you a) build and use them and b) apply whatever C debugging technique that you prefer to try to figure out what is happening.
OK… My only other advice, which you may of course already have applied, would be to test with a very minimal program that basically does nothing other than what you showed initially. And of course I can’t think of anything that has changed between 4.0 and 6.4 (actually not a huge lot in libconfd) that would cause this kind of result if it wasn’t taken into account by application code.