Subscription failed

Hi team,
I have one issue of subscription failed to connect to confd.
At the beginning, there were several active session (including config & subscription) connecting to confd server, some of the subscriber got events replayed, like session 93, 96, 89 etc,
but session 81, it got none event replayed since connecting to confd server(pls refer log Line 1435 to Line 1734 “replayComplete”)
After that, all other subscription failed to connect to confd server.
It could not be recovered till I kill the confd and restart it.

Here is the log:

  	Line 1713: Jun  2 15:59:13  daemon.info confd[1589]: netconf id=89 sending notification {urn:ietf:params:xml:ns:yang:ietf-netconf-notifications}netconf-session-start
	Line 1715: Jun  2 15:59:13  daemon.info confd[1589]: netconf id=98 sending notification {urn:ietf:params:xml:ns:yang:ietf-netconf-notifications}netconf-session-start
	Line 1717: Jun  2 15:59:13  daemon.info confd[1589]: netconf id=93 sending notification {urn:ietf:params:xml:ns:yang:ietf-netconf-notifications}netconf-session-start
	Line 1719: Jun  2 15:59:13  daemon.info confd[1589]: netconf id=96 sending notification {urn:ietf:params:xml:ns:yang:ietf-netconf-notifications}netconf-session-start
	Line 1724: Jun  2 15:59:13  daemon.info confd[1589]: netconf id=102 new ssh session for user "admin" from 10.10.103.144
	Line 1725: Jun  2 15:59:13  daemon.info confd[1589]: netconf id=97 sending notification {urn:ietf:params:xml:ns:yang:ietf-netconf-notifications}netconf-session-start
	Line 1727: Jun  2 15:59:13  daemon.info confd[1589]: netconf id=89 sending notification {urn:ietf:params:xml:ns:yang:ietf-netconf-notifications}netconf-session-start
	Line 1729: Jun  2 15:59:13  daemon.info confd[1589]: netconf id=98 sending notification {urn:ietf:params:xml:ns:yang:ietf-netconf-notifications}netconf-session-start
	Line 1730: Jun  2 15:59:13  daemon.info confd[1589]: netconf id=93 sending notification {urn:ietf:params:xml:ns:yang:ietf-netconf-notifications}netconf-session-start
	Line 1734: Jun  2 15:59:13  daemon.info confd[1589]: netconf id=81 sending notification replayComplete
	Line 1735: Jun  2 15:59:13  daemon.info confd[1589]: netconf id=102 got rpc: {urn:ietf:params:xml:ns:netconf:notification:1.0}create-subscription attrs: message-id="1"
	Line 1736: Jun  2 15:59:13  daemon.info confd[1589]: netconf id=102 create-subscription stream='NETCONF' attrs: message-id="1"

    Line 1435: Jun  2 15:59:12  daemon.info confd[1589]: netconf id=81 new ssh session for user "admin" from 10.10.96.84
	Line 1461: Jun  2 15:59:12  daemon.info confd[1589]: netconf id=81 got rpc: {urn:ietf:params:xml:ns:netconf:notification:1.0}create-subscription attrs: message-id="1"
	Line 1463: Jun  2 15:59:12  daemon.info confd[1589]: netconf id=81 create-subscription stream='NETCONF' attrs: message-id="1"
	Line 1474: Jun  2 15:59:12  daemon.info confd[1589]: netconf id=81 sending rpc-reply, attrs: message-id="1"
	Line 1734: Jun  2 15:59:13  daemon.info confd[1589]: netconf id=81 sending notification replayComplete
	Line 1737: Jun  2 15:59:13  daemon.info confd[1589]: netconf id=81 sending notification {urn:ietf:params:xml:ns:yang:ietf-netconf-notifications}netconf-session-start
	Line 2960: Jun  2 16:01:21  daemon.info confd[1589]: netconf id=81 ssh transport closed
  • When the above issue happened, I tried to do netconf get operation, it will be success except to get “netconf-state/streams”. If I tried to do edit config, it will be stucked either.
  • There are some static in-rpcs and out-rpc-erros, pls refer to the output:

localhost:xml[ppc-linux-bob_work-main]$netconf-console --port 830 -u admin -p admin123 --get -x “netconf-state/statistics”

<?xml version="1.0" encoding="UTF-8"?>
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="1">
  <data>
    <netconf-state xmlns="urn:ietf:params:xml:ns:yang:ietf-netconf-monitoring">
      <statistics>
        <netconf-start-time>2018-12-27T04:31:42-00:00</netconf-start-time>
        <in-bad-hellos>0</in-bad-hellos>
        <in-sessions>119</in-sessions>
        <dropped-sessions>4</dropped-sessions>
        <in-rpcs>147</in-rpcs>
        <in-bad-rpcs>10</in-bad-rpcs>
        <out-rpc-errors>15</out-rpc-errors>
        <out-notifications>7262</out-notifications>
      </statistics>
    </netconf-state>
  </data>
</rpc-reply>

If wihout this issue, I can get the stream information as below:

*localhost:xml[ppc-linux-bob_work-main]$netconf-console --port 830 -u admin -p admin123 --get -x “netconf-state/streams” *
<?xml version="1.0" encoding="UTF-8"?>

  • *
  • *
  •  <streams xmlns="http://tail-f.com/yang/netconf-monitoring">*
    
  •    <stream>*
    
  •      <name>NETCONF</name>*
    
  •      <description>Default NETCONF event stream for both notification of both alarms and events</description>*
    
  •      <replay-support>true</replay-support>*
    
  •      <replay-log-creation-time>2020-06-04T06:12:39+00:00</replay-log-creation-time>*
    
  •    </stream>*
    
  •  </streams>*
    
  • *
  • *

Hi,

What’s the rpc-reply something like below?

$ netconf-console --port 830 -u admin -p admin123 --get -x /confd-state/internal 
<?xml version="1.0" encoding="UTF-8"?>
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="42">
  <data>
    <confd-state xmlns="http://tail-f.com/yang/confd-monitoring">
      <internal>
...
      </internal>
    </confd-state>
  </data>
</rpc-reply>

Isn’t this the same problem that you described in the Subscription Stuck Issue thread?

Here it is,

<?xml version="1.0" encoding="UTF-8"?>
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="1">
  <data>
    <confd-state xmlns="http://tail-f.com/yang/confd-monitoring">
      <internal>
        <callpoints>
          <callpoint>
            <id>CpuState</id>
            <daemon>
              <id>1</id>
              <name>TransHdlr</name>
              <callbacks>get_object</callbacks>
              <callbacks>get_elem</callbacks>
            </daemon>
          </callpoint>
          ...
          <validationpoint>
            <id>sys_hostN</id>
            <daemon>
              <id>1</id>
              <name>TransHdlr</name>
            </daemon>
          </validationpoint>
           ...
          <actionpoint>
            <id>activate_file</id>
            <daemon>
              <id>2</id>
              <name>ActHdlr</name>
            </daemon>
          </actionpoint>
          <notification-stream-replay>
            <name>NETCONF</name>
            <replay-support>builtin</replay-support>
          </notification-stream-replay>
          <authorization-callbacks>
            <enabled>false</enabled>
            <daemon>
              <id>1</id>
              <name>TransHdlr</name>
            </daemon>
            <error>UNKNOWN</error>
          </authorization-callbacks>
        </callpoints>
        <cdb>
          <datastore>
            <name>running</name>
            <transaction-id>1545-917667-660066</transaction-id>
            <filename>./cur-cdb/A.cdb</filename>
            <disk-size>7408</disk-size>
            <ram-size>24814</ram-size>
            <read-locks>0</read-locks>
            <write-lock-set>false</write-lock-set>
            <waiting-for-replication-sync>false</waiting-for-replication-sync>
          </datastore>
          <datastore>
            <name>operational</name>
            <filename>./cur-cdb/O.cdb</filename>
            <disk-size>4</disk-size>
            <ram-size>72</ram-size>
            <subscription-lock-set>false</subscription-lock-set>
          </datastore>
          <client>
            <name>ConfdAgent</name>
            <info>1694/16</info>
            <type>inactive</type>
          </client>
          <client>
            <name>ConfdAgent</name>
            <info>1694/15</info>
            <type>subscriber</type>
            <subscription>
              <datastore>running</datastore>
              <priority>3</priority>
              <id>7</id>
              <path>/system</path>
            </subscription>
            ...
          </client>
        </cdb>
      </internal>
    </confd-state>
  </data>
</rpc-reply>

it is different problem. The subscription stuck issue was resolved with the option /confdConfig/netconf/writeTimeout

As you are using the builtin replay support for the NETCONF stream, what’s the output if you print that NETCONF replay file?
I.e.

$ ls -l NETCONF.*
-rw-r--r--  1 guest  staff  66248 Jun  5 12:28 NETCONF.1
-rw-r--r--  1 guest  staff     18 Jun  5 11:48 NETCONF.idx
-rw-r--r--  1 guest  staff     13 Jun  5 11:48 NETCONF.siz
$ confd --printlog NETCONF.1 
{timestamps2,{63758569259,813997},{63758569259,813997},not_found}
{notif,{63758569272,548670},
       {{aaa_server,chk_notification_access,
                    ['urn:ietf:params:xml:ns:yang:ietf-netconf-notifications',
                     'netconf-session-start']},
...

Also, what’s the value of your /confdConfig/netconf/writeTimeout?