Subscription Stuck Issue

Hi team,
I have an issue of subscription. In my case, there are several netconf clients, all of them created subscription and have netconf operation in parallel. Sometimes if try to create new subscription, it will be timeout. Is this a known issue or something I need to configure? Please help, thanks.

Here is my confd.conf:

 <capabilities>
   ....
     <notification>
         <enabled>true</enabled>
         <interleave>
           <enabled>true</enabled>
         </interleave>
      </notification>
 </capabilities>

  <notifications>
    <eventStreams>
      <stream>
        <name>NETCONF</name>
        <description>Default NETCONF event stream</description>
        <replaySupport>true</replaySupport>
        <builtinReplayStore>
          <enabled>true</enabled>
          <dir>./cur-cdb/replay</dir>
          <maxSize>S512K</maxSize>
          <maxFiles>6</maxFiles>
        </builtinReplayStore>
      </stream>
    </eventStreams>
  </notifications>

This is obviously a bug - somewhere. Maybe netconf.log and/or devel.log (with trace log level) would help, did you have a look there?

I looked into the netconf.log, I found two session was not closed.

The session id 21 was created but not closed, and later another session id 21 was created again from other client (different IP address)
How is the session id increased?
In this case, will the previous session 21 be kept open always?
Will this cause the subscription creation failed later?

#Following is part of netconf.log
The log start time is 1-May-2019::18:49:53.81
There are two NETCONF stream were not closed. one is id=21, the other is id=110

#The first log of id=21 creation from 10.10.96.232
1-May-2019::18:49:54.323 confd[1604]: netconf id=21 new ssh session for user “Test” from 10.10.96.232
1-May-2019::18:49:54.353 confd[1604]: netconf id=21 create-subscription stream=‘NETCONF’ attrs: nc:message-id=“1”
1-May-2019::18:49:54.355 confd[1604]: netconf id=21 sending rpc-reply, attrs: nc:message-id=“1”

… lots of notification event log, will not show them here

1-May-2019::19:20:42.114 confd[1604]: netconf id=21 sending notification {http://openconfig.net/yang/event}event-notification
1-May-2019::19:20:45.604 confd[1604]: netconf id=21 sending notification {http://openconfig.net/yang/event}event-notification

#The last time of id=21 log,Why the netconf id is 21 again? The previous id=21 was not closed, looks like this session come from other IP address
2-May-2019::00:13:29.195 confd[1590]: netconf id=21 new ssh session for user “Test” from 10.10.96.89

#Total 3 items of id=110 (from 10.10.103.144)
1-May-2019::23:05:04.994 confd[1604]: netconf id=110 new ssh session for user “Test” from 10.10.103.144
1-May-2019::23:05:05.004 confd[1604]: netconf id=110 got rpc: {urn:ietf:params:xml:ns:netconf:notification:1.0}create-subscription attrs: nc:message-id=“1”
1-May-2019::23:05:05.005 confd[1604]: netconf id=110 create-subscription stream=‘NETCONF’ attrs: nc:message-id=“1”

Did anything interesting happen between 1-May-2019::19:20:45.604 and 2-May-2019::00:13:29.195? For instance, if ConfD restarted, it would explained both - why you don’t see session 21 close as well as why the session id 21 is used again.

I’m not sure what “This” refers to here, but the fact that a client that stops reading its notifications will prevent notification delivery for other clients - described in @oak’s original post, but now removed by an edit - is a known side effect of intentional design, see Improving performance of notification streams. The failure to create a new subscription in this scenario sounds like a bug, though.

Right - since the reported PID of ConfD changes between those entries, it definitely did restart:

1-May-2019::19:20:45.604 confd[1604]:
2-May-2019::00:13:29.195 confd[1590]:

Thanks @per @mvf
Yes, you are right, I double checked the log, and found ConfD restart between 1-May-2019::19:20:45.604 and 2-May-2019::00:13:29.195

Let me summary the issue
In original post, because one of client terminal without close session, and that caused other clients can’t receive notification anymore. @per pointed out the link of “Improving performance of notification streams”, I will try to config /confdConfig/netconf/writeTimeout and verity again.

There is another issue which I descripted in modified post, all clients can receive notifications but I can’t create new subscription. No more clue in netconf.log.

BTW, I did not see any log in confd when application try to create subscription timeout. Any suggested debug trace to be configured in confd.conf?

I got an easy way to reproduce this issue, and observed that if the subscription was blocked, create new subscription would be timeout also. If kill the client which did not read subscription, then everything will be recovered and works well.
I also set the timeout, it works.

Thanks all for your help!