Multiple panics in Dcmd when upgraded the software

The customer has seen multiple Dcmd crash while upgraded the software within patch release. there is no major changes between the release.
We had only customer found defects, those are not released to Dcmd and confd.
Please find the below back trace.

confderr.log.1

^@^@^Ak^@,Internal error: failed to load AAA from CDB

j^@^@^D^[bWLA<83>h^Bk^@^Y13-Oct-2024::08:14:40.281h^Cd^@^Eerrorgd^@^Mnonode@nohost^@^@^@^\^@^@^@^@^@h^Cgd^@^Mnonode@nohost^@^@^D¤^@^@^@^@^@k^@failed too load AAA ~p fromCDB: [~p~nl^@^@^@^Bl^@^@^@^Al^@^@^@^Ad^@^\http://tail-f.com/ns/aaa/1.1d^@^Caaajh^Bd^@^Ofunction_clausel^@^@^@^Kh^Dd^@](mailto:~p~nl%5e@%5e@%5e@%5eBl%5e@%5e@%5e@%5eAl%5e@%5e@%5e@%5eAd%5e@%5e\http://tail-f.com/ns/aaa/1.1d%5e@%5eCaaajh%5eBd%5e@%5eOfunction_clausel%5e@%5e@%5e@%5eKh%5eDd%5e@)

confd.log

<DEBUG> 13-Oct-2024::12:59:24.381 st-intra-acc1Gsw01 confd[1145]: - Loading file /etc/confd/zone.ccl
<INFO> 13-Oct-2024::12:59:25.104 st-intra-acc1Gsw01 confd[1145]: - Starting to listen for Internal IPC on 127.0.0.1:4565
<INFO> 13-Oct-2024::12:59:27.844 st-intra-acc1Gsw01 confd[1145]: - ConfD phase0 started
<CRIT> 13-Oct-2024::13:04:29.493 st-intra-acc1Gsw01 confd[1145]: - Internal error: failed to load AAA from CDB
<INFO> 13-Oct-2024::13:04:41.130 st-intra-acc1Gsw01 confd[3813]: - Enabling error log
<INFO> 13-Oct-2024::13:04:41.139 st-intra-acc1Gsw01 confd[3813]: - Writing error log to /var/confd/log/confderr.log
<INFO> 13-Oct-2024::13:04:41.175 st-intra-acc1Gsw01 confd[3813]: - Writing daemon log to /var/confd/log/confd.log
<INFO> 13-Oct-2024::13:04:41.175 st-intra-acc1Gsw01 confd[3813]: - Writing NETCONF log to /var/confd/log/netconf.log
<INFO> 13-Oct-2024::13:04:41.176 st-intra-acc1Gsw01 confd[3813]: - Writing audit log to /var/confd/log/audit.log
<INFO> 13-Oct-2024::13:04:41.176 st-intra-acc1Gsw01 confd[3813]: - Writing developer log to /var/confd/log/devel.log
<INFO> 13-Oct-2024::13:04:41.176 st-intra-acc1Gsw01 confd[3813]: - Daemon logging started
<DEBUG> 13-Oct-2024::13:04:41.190 st-intra-acc1Gsw01 confd[3813]: - Loading file libconfd.so
<DEBUG> 13-Oct-2024::13:04:41.217 st-intra-acc1Gsw01 confd[3813]: - Loading file /etc/confd/confd_types.so
<INFO> 13-Oct-2024::08:02:52.225 st-intra-coresw02 confd[4720]: - ConfD phase0 started
<CRIT> 13-Oct-2024::08:02:52.693 st-intra-coresw02 confd[4720]: - no registration found for callpoint user_cp/get_next of type=external
<CRIT> 13-Oct-2024::08:02:52.717 st-intra-coresw02 confd[4720]: - Internal error: failed to load AAA from CDB
<INFO> 13-Oct-2024::08:05:02.717 st-intra-coresw02 confd[8228]: - Enabling error log
<INFO> 13-Oct-2024::08:05:02.727 st-intra-coresw02 confd[8228]: - Writing error log to /var/confd/log/confderr.log
<INFO> 13-Oct-2024::08:05:02.766 st-intra-coresw02 confd[8228]: - Writing daemon log to /var/confd/log/confd.log
<INFO> 13-Oct-2024::08:05:02.767 st-intra-coresw02 confd[8228]: - Writing NETCONF log to /var/confd/log/netconf.log
<INFO> 13-Oct-2024::08:05:02.767 st-intra-coresw02 confd[8228]: - Writing audit log to /var/confd/log/audit.log
<INFO> 13-Oct-2024::08:05:02.767 st-intra-coresw02 confd[8228]: - Writing developer log to /var/confd/log/devel.log

(gdb) bt
#0  0x059e2d64 in *__GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:67
#1  0x059e8260 in *__GI_abort () at abort.c:88
#2  0x07179d94 in fos_sighandler (signum=318922584) at /vobs/projects/springboard/build/swbd1000/fabos/bccb/lib/utils/signals.c:212
#3  <signal handler called>
#4  0x059e2d64 in *__GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:67
#5  0x059e8260 in *__GI_abort () at abort.c:88
#6  0x059da6f0 in *__GI___assert_fail (assertion=0xfcef43c "0", file=0xfc524cc "Framework/ObjectModel/WaveObjectManager.cpp", line=6537, 
    function=0xfc52118 "virtual void WaveNs::WaveObjectManager::prismAssert(bool, const char*, WaveNs::UI32)") at assert.c:78
#7  0x0f3e00d0 in WaveNs::WaveObjectManager::prismAssert (this=0x108441b8, isAssertNotRequired=<value optimized out>, pFileName=<value optimized out>, 
    lineNumber=<value optimized out>) at Framework/ObjectModel/WaveObjectManager.cpp:6537
#8  0x0f4cd698 in WaveNs::prismAssert (isAssertNotRequired=<value optimized out>, pFileName=0xfcc1ea8 "Framework/OsLayer/PrismOsLayer.cpp", lineNumber=305)
    at Framework/Utils/AssertUtils.cpp:17
#9  0x0f95409c in WaveNs::sigSegvHandler (signal=<value optimized out>) at Framework/OsLayer/PrismOsLayer.cpp:305
#10 <signal handler called>
#11 0x05a315c0 in strlen () from ./lib/libc.so.6
#12 0x0739e68c in erl_mk_atom (s=0x0) at legacy/erl_eterm.c:140
#13 0x07362e34 in confd_notification_send (nctx=0x121cb288, time=0x1b506a58, values=0x1133a5f0, nvalues=<value optimized out>) at confd_lib.c:5586
#14 0x0b4d5ec8 in DcmNs::NetconfNotificationStreamObjectManager::netconfNotificationHandler (this=0x1130b748, pNotifyMessage=0x16404120)
    at ConfdGateway/NetconfNotificationStreamObjectManager.cpp:320
#15 0x0f3d3888 in WaveNs::WaveObjectManager::PrismOperationMapContext::executeMessageHandler (this=<value optimized out>, pPrismMessage=<value optimized out>)
    at Framework/ObjectModel/WaveObjectManager.cpp:303
#16 0x0f3df8ac in WaveNs::WaveObjectManager::handlePrismMessage (this=0x1130b748, pPrismMessage=0x16404120) at Framework/ObjectModel/WaveObjectManager.cpp:1448
#17 0x0f96c1e4 in WaveNs::PrismThread::start (this=0x3) at Framework/MultiThreading/PrismThread.cpp:121
#18 0x0f96fcd0 in WaveNs::PrismPosixThread::pthreadStartMethod (pPrismPoixThread=0x0) at Framework/MultiThreading/PrismPosixThread.cpp:157
#19 0x0dfd9e98 in start_thread (arg=<value optimized out>) at pthread_create.c:302
#20 0x05a8f554 in clone () from ./lib/libc.so.6
Previous frame inner to this frame (corrupt stack?)

Looks like DCMD is trying to generate a NETCONF notification and sending the notification details to confd via confd_notification_send( ).
From the below back trace, we suspect the Dcmd send pNotificationStreamObject->m_pNotificationContext as NULL and strlen invokes the NULL string hence it’s crashed.

(gdb) bt
#0  0x059e2d64 in *__GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:67
#1  0x059e8260 in *__GI_abort () at abort.c:88
#2  0x07179d94 in fos_sighandler (signum=318922584) at /vobs/projects/springboard/build/swbd1000/fabos/bccb/lib/utils/signals.c:212
#3  <signal handler called>
#4  0x059e2d64 in *__GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:67
#5  0x059e8260 in *__GI_abort () at abort.c:88
#6  0x059da6f0 in *__GI___assert_fail (assertion=0xfcef43c "0", file=0xfc524cc "Framework/ObjectModel/WaveObjectManager.cpp", line=6537, 
    function=0xfc52118 "virtual void WaveNs::WaveObjectManager::prismAssert(bool, const char*, WaveNs::UI32)") at assert.c:78
#7  0x0f3e00d0 in WaveNs::WaveObjectManager::prismAssert (this=0x108441b8, isAssertNotRequired=<value optimized out>, pFileName=<value optimized out>, 
    lineNumber=<value optimized out>) at Framework/ObjectModel/WaveObjectManager.cpp:6537
#8  0x0f4cd698 in WaveNs::prismAssert (isAssertNotRequired=<value optimized out>, pFileName=0xfcc1ea8 "Framework/OsLayer/PrismOsLayer.cpp", lineNumber=305)
    at Framework/Utils/AssertUtils.cpp:17
#9  0x0f95409c in WaveNs::sigSegvHandler (signal=<value optimized out>) at Framework/OsLayer/PrismOsLayer.cpp:305
#10 <signal handler called>
#11 0x05a315c0 in strlen () from ./lib/libc.so.6
#12 0x0739e68c in erl_mk_atom (s=0x0) at legacy/erl_eterm.c:140
#13 0x07362e34 in confd_notification_send (nctx=0x121cb288, time=0x1b506a58, values=0x1133a5f0, nvalues=<value optimized out>) at confd_lib.c:5586
**#14 0x0b4d5ec8 in DcmNs::NetconfNotificationStreamObjectManager::netconfNotificationHandler (this=0x1130b748, pNotifyMessage=0x16404120)**
**    at ConfdGateway/NetconfNotificationStreamObjectManager.cpp:320**
#15 0x0f3d3888 in WaveNs::WaveObjectManager::PrismOperationMapContext::executeMessageHandler (this=<value optimized out>, pPrismMessage=<value optimized out>)
    at Framework/ObjectModel/WaveObjectManager.cpp:303
#16 0x0f3df8ac in WaveNs::WaveObjectManager::handlePrismMessage (this=0x1130b748, pPrismMessage=0x16404120) at Framework/ObjectModel/WaveObjectManager.cpp:1448
#17 0x0f96c1e4 in WaveNs::PrismThread::start (this=0x3) at Framework/MultiThreading/PrismThread.cpp:121
#18 0x0f96fcd0 in WaveNs::PrismPosixThread::pthreadStartMethod (pPrismPoixThread=0x0) at Framework/MultiThreading/PrismPosixThread.cpp:157
#19 0x0dfd9e98 in start_thread (arg=<value optimized out>) at pthread_create.c:302
#20 0x05a8f554 in clone () from ./lib/libc.so.6
(gdb)
(gdb) p *pNotificationStreamObject
$35 = {_vptr.NotificationStreamObject = 0xdc5f890, m_pNotificationContext = 0x121cb288, m_lastNotificationTime = {year = 2024, month = 10 '\n', day = 13 '\r', 
    hour = 7 '\a', min = 30 '\036', sec = 27 '\033', micro = 923974, timezone = 0 '\0', timezone_minutes = 0 '\0'}, m_replayCreationTime = {year = 2024, month = 10 '\n', 
    day = 13 '\r', hour = 7 '\a', min = 28 '\034', sec = 16 '\020', micro = 752762, timezone = 0 '\0', timezone_minutes = 0 '\0'}, m_isNotificationPresent = true}
(gdb)
(gdb) p *pNotificationStreamObject->m_pNotificationContext
**$34 = {name = 0x0, ctx_name = 0x0,** **fd = 1728805214**, dx = 0x8a7b3, error = {code = 118, apptag = {tag = 262899572, ns = 0}, str = 0xc8c <Address 0xc8c out of bounds>, 
    info = 0x10897158}, cb_opaque = 0x14, live_fd = 7629167, subid = 41, flags = 1, src_addr = {af = 306444224, ip = {v4 = {s_addr = 297128288}, v6 = {__in6_u = {
          __u6_addr8 = "\021µÑ`\022+Kp\000\000\000\024\000\000\000«", __u6_addr16 = {4533, 53600, 4651, 19312, 0, 20, 0, 171}, __u6_addr32 = {297128288, 304827248, 20, 
            171}}}}}, seen_reply = 1734694009, query_ref = 1885667328}
(gdb)

hello,

if my understanding is correct, DCMD is your application that uses ConfD-lib to communicate with ConfD.
As we’re not familiar with it’s internals or codebase, it’s a bit hard to give spot-on hints.

There may have been some change between confd versions (a you mention upgrade) that caused old implementation to behave differently?

right, notification context should not be NULL, so you may need to check what’s being done and that your notification handling flow is as per e.g. used confd version examples/user guide…

This might signal that your ConfD did not start with all its components properly, thus causing your otherwise OK notification code to misbehave? (or behave “ok”, but reject notification connections for your DCMD code against expectations because of some AAA related issues?)