Confd does not release netconf port after successful netconf-disable commit

Hi,
I have 2 links from my schema to confd schema:

                    leaf enabled {
                            tailf:info "Netconf status";
                            type boolean;
                            tailf:link "/dyncfg:confdConfig/dyncfg:netconf/dyncfg:enabled";
                    }

                    leaf enabled {
                            tailf:info "Restconf status";
                            type boolean;
                            tailf:link "/dyncfg:confdConfig/dyncfg:restconf/dyncfg:enabled";
                    }

Looks like sometimes after successfully committing a change to disable netconf, confd does not close its port.
Then, when re-enabling netconf, confd fails because that port is already in use.
Please see attached logs.
Here are the main parts:

  • webui commit ‘netconf disabled’ so we don’t see that in the audit log, because that’s the default value:

<INFO> 18-Apr-2023::09:50:42.016 u22c_rx29 confd[1186980]: audit user: admin/57 JSON-RPC: 'commit' with JSON params {"th":91}
<INFO> 18-Apr-2023::09:50:42.060 u22c_rx29 confd[1186980]: audit user: admin/57 commit thandle 174398 begin
<INFO> 18-Apr-2023::09:50:42.062 u22c_rx29 confd[1186980]: audit user: admin/57 commit thandle 174398 /general:general/touch set to "592"
<INFO> 18-Apr-2023::09:50:42.063 u22c_rx29 confd[1186980]: audit user: admin/57 commit thandle 174398 end
<INFO> 18-Apr-2023::09:50:42.076 u22c_rx29 confd[1186980]: audit user: admin/57 WebUI commit succeeded

On this case, confd doesn’t stop using that port, in contrast to other cases you can see in the logs before that case.

  • webui commit ‘netconf enabled’:
<INFO> 18-Apr-2023::09:51:09.469 u22c_rx29 confd[1186980]: audit user: admin/57 JSON-RPC: 'commit' with JSON params {"th":92}
<INFO> 18-Apr-2023::09:51:09.501 u22c_rx29 confd[1186980]: audit user: admin/57 commit thandle 174400 begin
<INFO> 18-Apr-2023::09:51:09.503 u22c_rx29 confd[1186980]: audit user: admin/57 commit thandle 174400 /system:system/netconf/enabled set to "true"
<INFO> 18-Apr-2023::09:51:09.504 u22c_rx29 confd[1186980]: audit user: admin/57 commit thandle 174400 /confd_dyncfg:confdConfig/netconf/enabled set to "true"
<INFO> 18-Apr-2023::09:51:09.504 u22c_rx29 confd[1186980]: audit user: admin/57 commit thandle 174400 /general:general/touch set to "593"
<INFO> 18-Apr-2023::09:51:09.505 u22c_rx29 confd[1186980]: audit user: admin/57 commit thandle 174400 end
<INFO> 18-Apr-2023::09:51:09.526 u22c_rx29 confd[1186980]: audit user: admin/57 WebUI commit succeeded

On this case, we can see in confd.log that confd fails to take the netconf port (2022) because it’s already in use, as opposed to other cases when it does on disable:

<INFO> 18-Apr-2023::09:50:36.087 u22c_rx29 confd[1186980]: - Stopping to listen for NETCONF SSH on :::2022
<INFO> 18-Apr-2023::09:50:36.091 u22c_rx29 confd[1186980]: - Stopping to listen for NETCONF SSH on 0.0.0.0:2022

Now it didn’t do it. So we end up with this:

<INFO> 18-Apr-2023::09:51:09.630 u22c_rx29 confd[1186980]: - Starting to listen for NETCONF SSH on 0.0.0.0:2022
<CRIT> 18-Apr-2023::09:51:09.642 u22c_rx29 confd[1186980]: - Cannot bind to NETCONF socket 0.0.0.0:2022 : address already in use (eaddrinuse)

Then confd does not start.

Thanks

This is confd 8.0.3, BTW

Hi,

Now it didn’t do it. So we end up with this:

And if you attempt a bit later the issue persists, i.e. the port is never closed when disabling NETCONF?

Then confd does not start.
Restarting ConfD?

The link should not be the issue? I.e. if you read the /dyncfg:confdConfig/dyncfg:netconf config it is set to what is expected?

And if you attempt a bit later the issue persists, i.e. the port is never closed when disabling NETCONF?

Right, the port never get closed.

Restarting ConfD?

Restarting confd does work. I want to avoid that and see how confd does not crash.

The link should not be the issue? I.e. if you read the /dyncfg:confdConfig/dyncfg:netconf config it is set to what is expected?

Right, reading the actual db is aligned with my links.

Prior to that, are there any related errors in the ConfD error log?

I can’t say for sure.
confderr.log.1 isn’t empty, but it doesn’t grow once this happens.
This is what I see from confd --printlog /var/confd/log/error/confderr.log.1:

2-May-2023::22:12:54.143 <0.47.0> <0.423.0> event_mgr:1479: Health check failed: [{netconf_server,
                                                                                   {'EXIT',
                                                                                    {noproc,
                                                                                     {gen_server,
                                                                                      call,
                                                                                      [netconf_server,
                                                                                       health_check,
                                                                                       infinity]}}}}]

[{event_mgr,'-health_checker/3-fun-1-',0,[{file,"event_mgr.erl"},{line,1479}]},
 {event_mgr,health_checker,3,[{file,"event_mgr.erl"},{line,1479}]},
 {proc_lib,init_p,3,[{file,"proc_lib.erl"},{line,234}]}]
2-May-2023::22:13:20.169 <0.47.0> <0.423.0> event_mgr:1479: Health check failed: [{netconf_server,
                                                                                   {'EXIT',
                                                                                    {noproc,
                                                                                     {gen_server,
                                                                                      call,
                                                                                      [netconf_server,
                                                                                       health_check,
                                                                                       infinity]}}}}]

[{event_mgr,'-health_checker/3-fun-1-',0,[{file,"event_mgr.erl"},{line,1479}]},
 {event_mgr,health_checker,3,[{file,"event_mgr.erl"},{line,1479}]},
 {proc_lib,init_p,3,[{file,"proc_lib.erl"},{line,234}]}]

The NETCONF server seems to go down before disabling NETCONF which would explain why the port is still open.
Looking at the error log that you uploaded to the Tail-f support ticket, there is a lot going on in that log before that.
The issue seems cascade from an action being called early in start phase 1 using JSON-RPC. What is that action doing that can cause an error?

I’m not sure. I don’t see why I would call an action when confd is on phase 1. Generally our application starts after, when confd is already running.
The confderr.log.1 I’ve attached above is from another reproduction than the one I’ve uploaded to the tailf-support site and this is the whole log, so I guess it’s cleaner reproduction.
I can reproduce again and uploaded if you want. I’m just not sure where because I don’t see here any way to upload files.

The action may originate from the ConfD JSON-RPC agent (not an application) trying to reset the loaded schema at startup, but something goes wrong.

Perhaps you want to try to find a way to work around the issue, or wait and see if someone in this forum can figure out what root cause could be.

Right.
While committing netconf/restconf enable/disable I have my webui open which do some queries and stuff on the background.

I’m not sure how I can work around - confd doesn’t close its netconf port, while it should.

As I wrote, the NETCONF server is killed (and can’t release the port from the dead) for some reason that needs to be investigated.

Right.
I just don’t understand who should investigate it.
Is it me? I do not have the confd code for checking this scenario.

You, the ConfD User Community, or ConfD support (require a subscription) can investigate.

You have the error logs etc., and the application to work with, so modifying the application YANG model, configuration, code, etc., will change the outcome. However, since no one else (ever) seems to have had this issue, you can perhaps try to align your setup with what “others” are doing.