Closed unexpectedly (while delivering subscription notification)

Hi Team,

I am observing below issue sporadically .
Please find the below devel logs for the same.

[root@du-oam-server-886587659-gg7pc run]# tail -f devel.log |grep -i notification
<DEBUG> 30-Mar-2023::11:08:48.280 du-oam-server-886587659-gg7pc confd[31]: confd progress usid=2782 tid=1429 datastore=running context=netconf subsystem=cdb all prepare subscription notifications acknowledged
<ERR> 30-Mar-2023::11:08:48.282 du-oam-server-886587659-gg7pc confd[31]: devel-cdb subscriber EVENT_API closed unexpectedly (while delivering subscription notification)
<DEBUG> 30-Mar-2023::11:08:48.282 du-oam-server-886587659-gg7pc confd[31]: confd progress usid=2782 tid=1429 datastore=running context=netconf subsystem=cdb delivering commit subscription notifications at prio 3
<ERR> 30-Mar-2023::11:08:48.282 du-oam-server-886587659-gg7pc confd[31]: devel-cdb subscriber odu_sys_mgr closed unexpectedly (while delivering subscription notification)
<ERR> 30-Mar-2023::11:08:48.282 du-oam-server-886587659-gg7pc confd[31]: devel-cdb subscriber DU closed unexpectedly (while delivering subscription notification)
<DEBUG> 30-Mar-2023::11:08:48.282 du-oam-server-886587659-gg7pc confd[31]: confd progress usid=2782 tid=1429 datastore=running context=netconf subsystem=cdb all commit subscription notifications acknowledged

Can you please let me know what does this error means.

When this issue occurred at that moment i have observe the behavior of “EVENT_API”, odu_sys_mgr, and DU" all these app seems normal and keep on running.

So how does confd detect the socket has been closed unexpectedly.

Note:
We are running all our application including netconf server in kubernetes environment.
Currently netconf server is running as a service and rest all application is communicating based on service IP and port.

Does confd supports cloud native solution . If yes do you have any specific charts/yaml configuration for the same.

It will be helpful if you tell us the way to debug this sporadic issue.

Thanks
Nishant Kumar


Attached few screen shot which clearly shows connection is in established mode,

The subscriber application, here “DU”, that had been notified of configuration changes by ConfD closed the connection (socket) to ConfD before replying back to ConfD with cdb_sync_subscription_socket().

Hi,
Thanks for your response, Can you please look the below design ,according my understanding “cdb_sync_subscription_socket” will not be able to invoke before replying back to confd.

   while (!(is_deinit))
   {
        struct confd_notification n;
        static int poll_fail_counter = 0;
        struct pollfd set[3];

        set[0].fd = confd_subs_sock;
        set[0].events = POLLIN;
        set[0].revents = 0;
        set[1].fd = poll_wakeup_fd;
        set[1].events = POLLIN;
        set[1].revents = 0;
        set[2].fd = heartbeat_notif_socket;
        set[2].events = POLLIN;
        set[2].revents = 0;
        if (poll(&set[0], 3, -1) < 0)
        {
            //TODO to LOGGING
            perror("Poll failed:");

            if (++poll_fail_counter < 10)
            {
                continue;
            }

            //TODO to LOGGING
            fprintf(stderr, "Too many poll failures, terminating\n");
            return;
        }

        if (set[2].revents & POLLIN)
        {
           switch (confd_read_notification(heartbeat_notif_socket, &n)) {
           case CONFD_OK:
                break;
           case CONFD_EOF:
                std::cout<<"Hearbeat Notification socket closed"<<std::endl;
                return;
           default:
                std::cout<<"Failed to read Heartbeat notification"<<std::endl;
                return;
            }
        }

        poll_fail_counter = 0;
        if (set[0].revents & POLLIN)
        {
            int sub_points;
            int reslen = 0;
            int status = 0;

            status = cdb_read_subscription_socket(confd_subs_sock,
                                                  &sub_points, &reslen);

            if (status != CONFD_OK)
            {
                //TODO to LOGGING
                fprintf(stderr, "terminate sub_read from set[0]: %d\n", status);
                return;
            }

            if (subscription_point_3gpp == sub_points)
            {
                allocated_and_return_config_request();
                dynamic_data(sub_points, confd_subs_sock);
                send_to_app();
                cdb_sync_subscription_socket(confd_subs_sock, CDB_DONE_PRIORITY);
            }
            else if ((sub_points == subscription_point_nrcelldu_op) || (sub_points == subscription_point_nrcelldu_av))
            {
                allocated_and_return_cell_control_cfg_ind();
                dynamic_data(sub_points, confd_subs_sock);
                 send_cell_control_cfg_ind_to_app();
                cdb_sync_subscription_socket(confd_subs_sock, CDB_DONE_OPERATIONAL);
            }
        }
        else if (set[1].revents & POLLIN) // this will wake up from poll
        {
            char buffer[2];
            read(poll_wakeup_fd, buffer, 2);
        }
    }
}

Debug your application to find out why it either exits/returns or close the confd_subs_sock before calling cdb_sync_subscription_socket().

Hi
As per your previous comment ,if any thing happened like exits/return at least socket connection should not been seen in established mode. As i share the screenshot above it is clearly seems in established mode. Please let me know if my understanding is not correct.

For the close of confd_subs_sock i check the code when ever we are calling cdb_close(confd_subs_sock); we are printing the log . which we are also not able to see .

There is one place in the ConfD source code that generates the log entry (%s == the session name):

subscriber %s closed unexpectedly (while delivering subscription notification)

The log entry generates when the OS TCP stack informs the Erlang VM, which ConfD is running on, that the TCP socket was closed while ConfD tries to send a subscription notification during the commit phase of the transaction. The commit does not fail, the changes are stored in ConfD’s CDB, but ConfD did not not get a sync subscription message back from the subscriber, hence the log entry in the developer log.

Why is the connection still established? Likely because the clients/daemons automatically restart/re-establish, and they then work as expected.

Hi,
Thanks for the valuable comment . But for your below statement,

"The log entry generates when the OS TCP stack informs the Erlang VM, which ConfD is running on, that the TCP socket was closed while ConfD tries to send a subscription notification during the commit phase of the transaction. "

Can the above statement be confirmed via some log so that we can conclude the problem is in OS TCP stack or Erlang VM or application(du).

Another important point we are running the confd in kubernetes environment,
Is cisco verified the confd behavior in cloud infra ?

Something broke the connection, and it was not ConfD and likely not the Erlang VM (see the ConfD error log to confirm) that ConfD integrates.

Yes

Hi

Thanks for your comment,

If cisco has verified the confd in cloud infra can you please provide the yaml configuration (Mainly resource allocation (memory/cpu) )which is being used ?

In our application code we are observing we are creating multiple socket connection towards confd for various operations like

  1. subscription notification.(Not closing this it is running through out the life span of app)
  2. reading data using cdb_diff_iterate (closing as soon as task completes)
  3. writing data to confd connection (closing as soon as task completes)

For each one of the above action we are calling confd_init("DU", debug_fd, debug_level) with the same name(DU) .

Will it impact any thing ,can we create multiple socket with same name towards confd from same application? .

Hi,

See the ConfD UG section “The Protocol and a Library Threads Discussion”, last paragraph:

When we use multiple threads, it is important to remember that threads can not “share” socket connections to ConfD. For the data provider API, this is basically fulfilled automatically, as we will not have multiple threads polling the same socket. But when we use e.g. the CDB or MAAPI APIs, the application must make sure that each thread has its own sockets. I.e. the ConfD API functions are thread-safe as such, but multiple threads using them with the same socket will have unpredictable results, just as multiple threads using the read() and write() system calls on the same file descriptor in general will. In the ConfD case, one thread may end up getting the response to a request from another, or even a part of that response, which will result in errors that can be very difficult to debug.

Example issue as a reference:

ConfD consumes no CPU and very little memory until your application add YANG models, data, and start using the APIs. Hence, the CPU and memory consumption depends on the application and how it integrates with ConfD.

See, for example, this demo application run.sh script for an example on how a simple shell script can be used to run test cases and measure wall-clock time for different northbound agent read requests, including ConfD’s memory usage (RSS) at the end of the test case and highwater mark memory usage (HWM), i.e., peak memory usage for the test case.