Issue in starting ConfD phase2

Hi Team,

In our application, we are trying to upgrade ConfD from 6.6.1 to
8.0.4.

While starting ConfD from phase1 to phase2, we are facing the below exception,

Maapi maapi = null;
try (Socket cdbSocket = new Socket("127.0.0.1", 4565)) {
        maapi = new Maapi(cdbSocket);
        maapi.startPhase(2, true);   //Exception is thrown here
} catch (ConfException e) {
        log.error("Exception while starting ConfD Phase " + phase, e);
} catch (IOException e) {
        log.error("Unable to start ConfD Phase " + phase, e);
}
> 2023-05-19 06:51:45.708 ERROR Exception while starting ConfD Phase 2
> **com.tailf.conf.ConfException: unexpected end of file**
>         at com.tailf.conf.ConfInternal.readFill(ConfInternal.java:422)
>         at com.tailf.conf.ConfInternal.termRead(ConfInternal.java:192)
>         at com.tailf.conf.ConfInternal.requestTerm(ConfInternal.java:548)
>         at com.tailf.conf.ConfInternal.requestTerm(ConfInternal.java:677)
>         at com.tailf.maapi.Maapi.startPhase(Maapi.java:10409)
>         at 
> com.cisco.esc.yang.util.ConfigDataUtil.startConfDPhase(ConfigDataUtil.java:201)
>         at com.cisco.esc.ESCStatus.run(ESCStatus.java:148)
> Caused by: java.io.EOFException: null

Could you please help us on this?

We do see the below issue in the confd log,

<INFO> 19-May-2023::07:31:08.997 dnd-st-esc-5-11-0-26-2 confd[55788]: - Starting to listen for Internal IPC on 127.0.0.1:4565
<INFO> 19-May-2023::07:31:09.443 dnd-st-esc-5-11-0-26-2 confd[55788]: - ConfD phase0 started
<NOTICE> 19-May-2023::07:31:33.512 dnd-st-esc-5-11-0-26-2 confd[55788]: - The notification replay store log 'escEvent' has been repaired
<NOTICE> 19-May-2023::07:31:33.520 dnd-st-esc-5-11-0-26-2 confd[55788]: - The notification replay store log 'lwmonEvent' has been repaired
<INFO> 19-May-2023::07:31:33.597 dnd-st-esc-5-11-0-26-2 confd[55788]: - ConfD phase1 started
<INFO> 19-May-2023::07:31:43.002 dnd-st-esc-5-11-0-26-2 confd[55788]: - Starting to listen for NETCONF SSH on 0.0.0.0:830
<CRIT> 19-May-2023::07:31:43.056 dnd-st-esc-5-11-0-26-2 confd[55788]: - Cannot bind to NETCONF socket 0.0.0.0:830 : permission denied (eacces)

Content of developer log,

<INFO> 19-May-2023::07:10:23.421 dnd-st-esc-5-11-0-26-2 confd[<0.69.0>]: confd embedded apps in early_phase0:
        []
<INFO> 19-May-2023::07:10:23.768 dnd-st-esc-5-11-0-26-2 confd[<0.99.0>]: confd mmap_schema handle_info got msg: timeout
<INFO> 19-May-2023::07:10:23.789 dnd-st-esc-5-11-0-26-2 confd[<0.9.0>]: confd embedded apps in phase0:
        []
<INFO> 19-May-2023::07:10:23.935 dnd-st-esc-5-11-0-26-2 confd[<0.137.0>]: devel-cdb Loaded schema file: ../esc_database/confd-cdb/C.cdb (v5 from ConfD version 8.0.4)
<INFO> 19-May-2023::07:10:23.979 dnd-st-esc-5-11-0-26-2 confd[<0.136.0>]: devel-cdb Loaded oper data from ../esc_database/confd-cdb/O.cdb (4 bytes data in 0.000s)
<INFO> 19-May-2023::07:10:23.980 dnd-st-esc-5-11-0-26-2 confd[<0.137.0>]: devel-cdb Loaded configuration from ../esc_database/confd-cdb/A.cdb (2.16 KiB data, 1 transaction in 0.000s)
<DEBUG> 19-May-2023::07:10:23.986 dnd-st-esc-5-11-0-26-2 confd[<0.136.0>]: devel-cdb cdb_op_lock_manager:grab_lock:178 mode=write id=#Ref<0.2041844420.3969384449.215258>
<INFO> 19-May-2023::07:10:23.986 dnd-st-esc-5-11-0-26-2 confd[<0.137.0>]: devel-cdb Compacted CDB journal file: 2 ms (149 nodes in memory, disk size 2.16 KiB -> 2.16 KiB)
<DEBUG> 19-May-2023::07:10:23.986 dnd-st-esc-5-11-0-26-2 confd[<0.136.0>]: devel-cdb cdb_op_lock_manager:log_lockpaths:943 path: []
<DEBUG> 19-May-2023::07:10:23.987 dnd-st-esc-5-11-0-26-2 confd[<0.131.0>]: devel-cdb cdb_op_lock_manager:reply:915 id=#Ref<0.2041844420.3969384449.215258> reply ok
<DEBUG> 19-May-2023::07:10:23.987 dnd-st-esc-5-11-0-26-2 confd[<0.136.0>]: devel-cdb cdb_op_lock_manager:release_lock:197 id=#Ref<0.2041844420.3969384449.215258>
<INFO> 19-May-2023::07:10:24.060 dnd-st-esc-5-11-0-26-2 confd[<0.9.0>]: confd Erlang emulator PID: 48762
<DEBUG> 19-May-2023::07:10:27.963 dnd-st-esc-5-11-0-26-2 confd[<0.133.0>]: devel-cdb connect from /opt/cisco/esc/confd/bin/confd_get_phase
<DEBUG> 19-May-2023::07:10:27.963 dnd-st-esc-5-11-0-26-2 confd[<0.133.0>]: devel-cdb client /opt/cisco/esc/confd/bin/confd_get_phase has pid 48817/3
<DEBUG> 19-May-2023::07:10:27.964 dnd-st-esc-5-11-0-26-2 confd[<0.133.0>]: devel-cdb inactive client /opt/cisco/esc/confd/bin/confd_get_phase closed
<DEBUG> 19-May-2023::07:10:27.974 dnd-st-esc-5-11-0-26-2 confd[<0.133.0>]: devel-cdb connect from /opt/cisco/esc/confd/bin/confd_get_phase
<DEBUG> 19-May-2023::07:10:27.974 dnd-st-esc-5-11-0-26-2 confd[<0.133.0>]: devel-cdb client /opt/cisco/esc/confd/bin/confd_get_phase has pid 48818/3
<DEBUG> 19-May-2023::07:10:27.975 dnd-st-esc-5-11-0-26-2 confd[<0.133.0>]: devel-cdb inactive client /opt/cisco/esc/confd/bin/confd_get_phase closed
<DEBUG> 19-May-2023::07:10:37.945 dnd-st-esc-5-11-0-26-2 confd[<0.133.0>]: devel-cdb connect from /opt/cisco/esc/confd/bin/confd_get_phase
<DEBUG> 19-May-2023::07:10:37.945 dnd-st-esc-5-11-0-26-2 confd[<0.133.0>]: devel-cdb client /opt/cisco/esc/confd/bin/confd_get_phase has pid 48874/3
<DEBUG> 19-May-2023::07:10:37.946 dnd-st-esc-5-11-0-26-2 confd[<0.133.0>]: devel-cdb inactive client /opt/cisco/esc/confd/bin/confd_get_phase closed
<DEBUG> 19-May-2023::07:11:05.747 dnd-st-esc-5-11-0-26-2 confd[<0.133.0>]: devel-cdb connect from Check_CDB_Phase
<DEBUG> 19-May-2023::07:11:05.750 dnd-st-esc-5-11-0-26-2 confd[<0.133.0>]: devel-cdb inactive client Check_CDB_Phase closed
<INFO> 19-May-2023::07:11:06.602 dnd-st-esc-5-11-0-26-2 confd[<0.68.0>]: confd encryptedStrings keys provided from config
<INFO> 19-May-2023::07:11:06.603 dnd-st-esc-5-11-0-26-2 confd[<0.169.0>]: confd embedded apps in phase1:
        []
<INFO> 19-May-2023::07:11:06.631 dnd-st-esc-5-11-0-26-2 confd[<0.172.0>]: confd initializing kickers
<INFO> 19-May-2023::07:11:06.648 dnd-st-esc-5-11-0-26-2 confd[<0.172.0>]: confd initializing kickers: ok (0.017 s)
<DEBUG> 19-May-2023::07:11:06.736 dnd-st-esc-5-11-0-26-2 confd[<0.205.0>]: devel-cdb Loaded AAA data from CDB
<INFO> 19-May-2023::07:11:06.743 dnd-st-esc-5-11-0-26-2 confd[<0.238.0>]: confd delete "../esc_database/confd-cdb/rollback0.confirm" resulted in enoent

<INFO> 19-May-2023::07:11:06.743 dnd-st-esc-5-11-0-26-2 confd[<0.238.0>]: confd delete "../esc_database/confd-cdb/rollback0.confirm.bup" resulted in enoent

<INFO> 19-May-2023::07:11:06.766 dnd-st-esc-5-11-0-26-2 confd[<0.169.0>]: confd embedded apps in phase1_delayed:
        []
<DEBUG> 19-May-2023::07:11:13.004 dnd-st-esc-5-11-0-26-2 confd[<0.133.0>]: devel-cdb connect from esc_cdb_sub_daemon
<DEBUG> 19-May-2023::07:11:13.004 dnd-st-esc-5-11-0-26-2 confd[<0.133.0>]: devel-cdb connect from esc_cdb_sub_daemon
<DEBUG> 19-May-2023::07:11:13.022 dnd-st-esc-5-11-0-26-2 confd[<0.133.0>]: devel-cdb new subscriber session for esc_cdb_sub_daemon
<DEBUG> 19-May-2023::07:11:13.022 dnd-st-esc-5-11-0-26-2 confd[<0.133.0>]: devel-cdb new subscriber session for esc_cdb_sub_daemon
<DEBUG> 19-May-2023::07:11:13.031 dnd-st-esc-5-11-0-26-2 confd[<0.124.0>]: devel-c New daemon connected (name: esc_opdata_daemon_ca72a33c-ea66-4202-b060-753fcd14ed58, daemon id: 0)
<DEBUG> 19-May-2023::07:11:13.032 dnd-st-esc-5-11-0-26-2 confd[<0.124.0>]: devel-c New daemon connected (name: escnotifsender1684480273027, daemon id: 1)
<DEBUG> 19-May-2023::07:11:13.032 dnd-st-esc-5-11-0-26-2 confd[<0.124.0>]: devel-c New daemon connected (name: esc_action_daemon_cdbfa031-a412-41ae-a423-d9b7803aaf89, daemon id: 2)
<DEBUG> 19-May-2023::07:11:13.038 dnd-st-esc-5-11-0-26-2 confd[<0.124.0>]: devel-c New worker connected (daemon id: 1 worker id: 2)
<DEBUG> 19-May-2023::07:11:16.018 dnd-st-esc-5-11-0-26-2 confd[<0.133.0>]: devel-cdb connect from Check_CDB_Phase
<DEBUG> 19-May-2023::07:11:16.019 dnd-st-esc-5-11-0-26-2 confd[<0.133.0>]: devel-cdb inactive client Check_CDB_Phase closed
<INFO> 19-May-2023::07:11:16.021 dnd-st-esc-5-11-0-26-2 confd[<0.263.0>]: confd embedded apps in phase2:
        []

We are checking on the permission issue. Meanwhile, is the above end of file
issue related to this permission issue.

Regards,
Srini

Try resolving the permission issue → my guess is it’s the root reason of problem you see.
All ports < 1024 are restricted to root access on linux by default…

Thanks for the reply. Sure, let me try to resolve the permission issue.

Just thought of understanding if both are two different issues.

Thanks. We have fixed the permission issue by running ConfD as root user, after which we didn’t face the ‘unexcepted EOF’ file.

However, now we are facing an exception in Maapi as below, when trying to do a transaction using Maapi.

Code snippet:

Socket s = new Socket(confd_ip, 4565);
Maapi maapi = new Maapi(s);

maapi.waitStarted();
maapi.startUserSession(username, "localhost", "maapi",
		new String[] { "groupname" }, MaapiUserSessionFlag.PROTO_SSH);

int th = maapi.startTrans(Conf.DB_RUNNING, Conf.MODE_READ_WRITE);
String datamodelXml = ManagedObjectUtil.datamodelToString(datamodel);	//Value of datamodelXml is an XML format

maapi.loadConfigCmds(th, EnumSet.of(MaapiConfigFlag.XML_FORMAT, MaapiConfigFlag.MAAPI_CONFIG_MERGE), datamodelXml, "");
maapi.validateTrans(th, false, false);
maapi.prepareTrans(th);		//Throws exception
maapi.commitTrans(th);
maapi.finishTrans(th);
s.close();

Exception:

com.tailf.maapi.MaapiException: undefined
        at com.tailf.maapi.MaapiException.mk(MaapiException.java:62)
        at com.tailf.maapi.MaapiException.mk(MaapiException.java:51)
        at com.tailf.maapi.Maapi.prepareTrans(Maapi.java:1961)
        at com.tailf.maapi.Maapi.prepareTrans(Maapi.java:1924)
        at com.cisco.esc.yang.util.ConfigDataUtil.initializeEscSystemConfigData(ConfigDataUtil.java:144)
        at com.cisco.esc.confd.YangESC.syncEscSystemConfigToCdb(YangESC.java:272)
        at com.cisco.esc.ESCStatus.run(ESCStatus.java:177)

The same code was working with old ConfD. Let us know if you have any idea.

Below is from ConfD developer logs,

<DEBUG> 23-May-2023::10:48:56.248 dnd-st-esc-5-11-0-26-2 confd[<0.551.0>]: confd progress usid=24 tid=28047 datastore=running context=maapi check data kickers (0.000 s)
<DEBUG> 23-May-2023::10:48:56.249 dnd-st-esc-5-11-0-26-2 confd[<0.551.0>]: confd progress usid=24 tid=28047 datastore=running context=maapi leaving write-start phase (0.002 s)
<DEBUG> 23-May-2023::10:48:56.249 dnd-st-esc-5-11-0-26-2 confd[<0.551.0>]: confd progress usid=24 tid=28047 datastore=running context=maapi entering prepare phase
<DEBUG> 23-May-2023::10:48:56.249 dnd-st-esc-5-11-0-26-2 confd[<0.551.0>]: confd progress usid=24 tid=28047 datastore=running context=maapi subsystem=cdb prepare
<DEBUG> 23-May-2023::10:48:56.261 dnd-st-esc-5-11-0-26-2 confd[<0.133.0>]: confd progress usid=24 tid=28047 datastore=running context=maapi subsystem=cdb delivering prepare subscription notifications at prio 1
<DEBUG> 23-May-2023::10:48:56.410 dnd-st-esc-5-11-0-26-2 confd[<0.133.0>]: confd progress usid=24 tid=28047 datastore=running context=maapi subsystem=cdb all prepare subscription notifications acknowledged
<DEBUG> 23-May-2023::10:48:56.411 dnd-st-esc-5-11-0-26-2 confd[<0.133.0>]: confd progress usid=24 tid=28047 datastore=running context=maapi subsystem=cdb aborting transaction
<DEBUG> 23-May-2023::10:48:56.412 dnd-st-esc-5-11-0-26-2 confd[<0.551.0>]: confd progress usid=24 tid=28047 datastore=running context=maapi leaving prepare phase (0.174 s)
<DEBUG> 23-May-2023::10:48:56.412 dnd-st-esc-5-11-0-26-2 confd[<0.551.0>]: confd progress usid=24 tid=28047 datastore=running context=maapi entering abort phase
<INFO> 23-May-2023::10:48:56.412 dnd-st-esc-5-11-0-26-2 confd[<0.551.0>]: confd delete "../esc_database/confd-cdb/rollback0.confirm" resulted in enoent
<INFO> 23-May-2023::10:48:56.412 dnd-st-esc-5-11-0-26-2 confd[<0.551.0>]: confd delete "../esc_database/confd-cdb/rollback0.confirm.bup" resulted in enoent

<INFO> 23-May-2023::10:48:56.413 dnd-st-esc-5-11-0-26-2 confd[<0.551.0>]: confd delete "../esc_database/confd-cdb/rollback0.prepare" resulted in enoent

<INFO> 23-May-2023::10:48:56.413 dnd-st-esc-5-11-0-26-2 confd[<0.551.0>]: confd delete "../esc_database/confd-cdb/rollback0.confirm" resulted in enoent

<INFO> 23-May-2023::10:48:56.413 dnd-st-esc-5-11-0-26-2 confd[<0.551.0>]: confd delete "../esc_database/confd-cdb/rollback0.confirm.bup" resulted in enoent

<DEBUG> 23-May-2023::10:48:56.415 dnd-st-esc-5-11-0-26-2 confd[<0.551.0>]: confd progress usid=24 tid=28047 datastore=running context=maapi holding transaction lock (0.191 s)
<DEBUG> 23-May-2023::10:48:56.415 dnd-st-esc-5-11-0-26-2 confd[<0.551.0>]: confd progress usid=24 tid=28047 datastore=running context=maapi leaving abort phase (0.003 s)
<DEBUG> 23-May-2023::10:48:56.416 dnd-st-esc-5-11-0-26-2 confd[<0.551.0>]: confd progress usid=24 tid=28047 datastore=running context=maapi applying transaction: error (0.191 s)

Regards,
Srini

Hmm, I don’t use Java API myself much…
My initial guess from code excerpt is that “new” ConfD version maybe results in rejected transaction in your case (e.g. some bug fixed that previously allowed passing)…
Try checking whether some of the two-phase subscribers (confd-lib apps) hooked into your models result in error/rejection of prepare phase of the two phase commit.

Thanks for the reply.

Sorry, I’m newbie for ConfD and trying to understand on the hooks.

From our related yang model, I don’t see any hooks attached. Could you please bit explain or point any related doc for the same.

Sure no problem.
What I mean is to check whether your architecture includes some ConfD two-phase subscribers.
These are not visible from/in YANG model itself.

They are apps that connect to specific parts of CDB and register for notifications on changes - when config in that part of model is updated → two phase subscriber’s code/hooks are invoked and can “reject” config in the first/prepare phase of transaction.

This is what I thought might be happening due to logs that you listed:

<DEBUG> 23-May-2023::10:48:56.411 dnd-st-esc-5-11-0-26-2 confd[<0.133.0>]: confd progress usid=24 tid=28047 datastore=running context=maapi subsystem=cdb aborting transaction

I am not yet sure whether this is reason of the exception you see in java, or its consequence…

Sure, Thanks for the explanation.

Let me check for the subscribers and get back for any clarification.

Hi,

Yes, the CDB is subscribed (two phase) for notifications.

cdb = new Cdb(cdbDaemonName, socket);
subscription = cdb.newSubscription();
int subid = subscription.subscribe(CdbSubscriptionType.SUB_RUNNING_TWOPHASE, 1, namespace, fmt);
subscription.subscribeDone();

On processing notification of type SUB_PREPARE, due to an exception, abort transaction is done.

int[] points = subscription.read();
CdbNotificationType type = subscription.getLatestNotificationType();
switch (type) {
	case SUB_PREPARE:
		//exception in our processing
		catch () {
			subscription.abortTransaction(new CdbExtendedException(CdbExtendedException.ERRCODE_APPLICATION, cause.getMessage()));
		}

On checking for the root cause of the exception, the maapi.saveConfig() call doesn’t return a proper XML, instead it returns a dummy empty XML in newer Confd.

EnumSet<MaapiConfigFlag> maapiConfigFlags = EnumSet.of(MaapiConfigFlag.XML_FORMAT, MaapiConfigFlag.MAAPI_CONFIG_XML_PRETTY);

if (withDefaults) {
	maapiConfigFlags.add(MaapiConfigFlag.WITH_DEFAULTS);
}

try (MaapiInputStream is = maapi.saveConfig(th, maapiConfigFlags, confPath)) {
	byte[] buf = new byte[4096];
	int bytesRead;
	StringBuffer sb = new StringBuffer();
	while ((bytesRead = is.read(buf, 0, buf.length)) != -1) {
		sb.append(new String(buf, 0, bytesRead, "UTF-8"));
	}
	
	return sb.toString();	//This returns a proper XML in old confd, but not in new confd
}

Old confd returns something like as per our YANG model,

<config xmlns="http://tail-f.com/ns/config/1.0">
  <esc_system_config xmlns="http://www.mycompany.com/esc/esc">
  <vim_connectors>
  <vim_connector>
    <id>conn-id</id>
      <users>
      <user>
        <id>admin</id>
        <credentials>
          <properties>
            <property>
              <name>os_password</name>
              <value>mypassword</value>
            </property>
            <property>
              <name>os_user_domain_name</name>
              <value>default</value>
            </property>
          </properties>
        </credentials>
      </user>
      </users>
  </vim_connector>
  </vim_connectors>
  </esc_system_config>
</config>

New confd returns this,

<config xmlns="http://tail-f.com/ns/config/1.0">
</config>

I’m trying to check the reason for this.

Is this something that the CDBs are not created/populated properly in new ConfD? Let me know if you have any guess / idea.

Update:

I simply copied the *.cdb files from the working setup from old confd. Now the config XML is proper.

So I guess CDB is the issue.

Any idea on why ConfD (or CDB, I assume) doesn’t returns proper config XML in the below code in new version, while processing SUB_PREPARE notification,

EnumSet<MaapiConfigFlag> maapiConfigFlags = EnumSet.of(MaapiConfigFlag.XML_FORMAT, MaapiConfigFlag.MAAPI_CONFIG_XML_PRETTY);

if (withDefaults) {
	maapiConfigFlags.add(MaapiConfigFlag.WITH_DEFAULTS);
}

try (MaapiInputStream is = maapi.saveConfig(th, maapiConfigFlags, confPath)) {
	byte[] buf = new byte[4096];
	int bytesRead;
	StringBuffer sb = new StringBuffer();
	while ((bytesRead = is.read(buf, 0, buf.length)) != -1) {
		sb.append(new String(buf, 0, bytesRead, "UTF-8"));
	}
	
	return sb.toString();	//This returns a proper XML in old confd, but not in new confd
}

Any pointer on debugging this further would be helpful.