Init XML data missing after clean restart

Hi,
The problem is like this (and is not reproducible most of the times, only in our jenkins run):

  1. We have a set of init XML files for initial configuration, which includes some default users and domain configurations.
  2. Bring up our system which includes confd, login using the configured users work.
  3. Stop the system, clean up all the workspace directories to do a clean start.
  4. Start the system again. Confd comes up normally without any errors.
  5. But, login does not work. Confd does not recognize the users.

We checked the “confd.log” and it says it loaded all the init files. Even I can see the user present in “A.cdb” file.

The audit log error is:

8-Oct-2018::09:21:46.465 jenkins-A–jenkins–j7-ssh1-3-5–0–2015fcee03 confd[8616]: audit user: admin/0 no such local user
8-Oct-2018::09:21:47.316 jenkins-A–jenkins–j7-ssh1-3-5–0–2015fcee03 confd[8616]: audit user: admin@system/0 no such local user
8-Oct-2018::09:21:48.165 jenkins-A–jenkins–j7-ssh1-3-5–0–2015fcee03 confd[8616]: audit user: admin@system/0 failed to login using externalauth: Request from RW.MgmtAgent
8-Oct-2018::09:21:48.165 jenkins-A–jenkins–j7-ssh1-3-5–0–2015fcee03 confd[8616]: audit user: admin@system/0 Provided bad password
8-Oct-2018::09:21:48.176 jenkins-A–jenkins–j7-ssh1-3-5–0–2015fcee03 confd[8616]: audit user: admin/0 failed to login using externalauth: denied
8-Oct-2018::09:21:48.176 jenkins-A–jenkins–j7-ssh1-3-5–0–2015fcee03 confd[8616]: audit user: admin/0 Provided bad password

It first tries local login, since that doesn’t work (it should have ideally), it tries external authentication which as expected fails.

confd.log:

8-Oct-2018::09:02:49.744 jenkins-A–jenkins–j7-ssh1-3-5–0–2015fcee03 confd[8616]: - Starting to listen for Internal IPC on 127.0.0.1:4565
8-Oct-2018::09:02:50.459 jenkins-A–jenkins–j7-ssh1-3-5–0–2015fcee03 confd[8616]: - CDB load: processing file: /usr/rift/var/rift/persist.riftware/var/confd/cdb/aaa_init.xml
8-Oct-2018::09:02:50.473 jenkins-A–jenkins–j7-ssh1-3-5–0–2015fcee03 confd[8616]: - CDB load: processing file: /usr/rift/var/rift/persist.riftware/var/confd/cdb/rw-auth-system.init.xml
8-Oct-2018::09:02:50.474 jenkins-A–jenkins–j7-ssh1-3-5–0–2015fcee03 confd[8616]: - CDB load: processing file: /usr/rift/var/rift/persist.riftware/var/confd/cdb/rw-project.init.xml
8-Oct-2018::09:02:50.476 jenkins-A–jenkins–j7-ssh1-3-5–0–2015fcee03 confd[8616]: - CDB load: processing file: /usr/rift/var/rift/persist.riftware/var/confd/cdb/rw-rbac-platform.init.xml
8-Oct-2018::09:02:50.477 jenkins-A–jenkins–j7-ssh1-3-5–0–2015fcee03 confd[8616]: - CDB load: processing file: /usr/rift/var/rift/persist.riftware/var/confd/cdb/rw-user.init.xml
8-Oct-2018::09:02:50.518 jenkins-A–jenkins–j7-ssh1-3-5–0–2015fcee03 confd[8616]: - CDB load: processing file: /usr/rift/var/rift/persist.riftware/var/confd/cdb/rwlog-mgmt.init.xml
8-Oct-2018::09:02:50.622 jenkins-A–jenkins–j7-ssh1-3-5–0–2015fcee03 confd[8616]: - ConfD phase0 started
8-Oct-2018::09:02:51.081 jenkins-A–jenkins–j7-ssh1-3-5–0–2015fcee03 confd[8616]: - ConfD phase1 started
8-Oct-2018::09:03:09.345 jenkins-A–jenkins–j7-ssh1-3-5–0–2015fcee03 confd[8616]: - Starting to listen for NETCONF SSH on 0.0.0.0:2022
8-Oct-2018::09:03:09.363 jenkins-A–jenkins–j7-ssh1-3-5–0–2015fcee03 confd[8616]: - Starting to listen for NETCONF TCP on 127.0.0.1:2023
8-Oct-2018::09:03:10.567 jenkins-A–jenkins–j7-ssh1-3-5–0–2015fcee03 confd[8616]: - Starting to listen for CLI SSH on 127.0.0.1:2024
8-Oct-2018::09:03:10.594 jenkins-A–jenkins–j7-ssh1-3-5–0–2015fcee03 confd[8616]: - ConfD started vsn: 6.3

Only difference we see is in the devel.log.

Below is the devel.log when it works:

1 8-Oct-2018::11:22:08.724 amuralid-arnml-1 confd[30321]: confd embedded apps in early_phase0:
2 8-Oct-2018::11:22:09.660 amuralid-arnml-1 confd[30321]: confd mmap_schema handle_info got msg: timeout
3 8-Oct-2018::11:22:09.715 amuralid-arnml-1 confd[30321]: confd embedded apps in phase0:
4 8-Oct-2018::11:22:10.326 amuralid-arnml-1 confd[30321]: devel-cdb init files found in /localdisk/amuralid/container/ub/master/rift/.build/ub16_debug/install/usr/rift/var/rift/518baf02-7b5e-4318-8124-929ee64 44bdd-mgmt-vm-lp-2/persist.riftware/var/confd/cdb: aaa_init.xml, rw-auth-system.init.xml, rw-project.init.xml, rw-rbac-platform.init.xml, rw-user.init.xml, rwlog-mgmt.init.xml
5 8-Oct-2018::11:22:12.039 amuralid-arnml-1 confd[30321]: confd embedded apps in phase1:
6 8-Oct-2018::11:22:22.024 amuralid-arnml-1 confd[30321]: devel-cdb Initiating CDB journal compaction
7 8-Oct-2018::11:22:22.027 amuralid-arnml-1 confd[30321]: devel-cdb Compacted CDB journal file: 2 ms (1071 nodes in memory, disk size 17.56 KiB → 16.46 KiB)
8 8-Oct-2018::11:22:37.049 amuralid-arnml-1 confd[30321]: confd embedded apps in phase2:
9 8-Oct-2018::11:22:49.781 amuralid-arnml-1 confd[30321]: devel-cdb Initiating CDB journal compaction
10 8-Oct-2018::11:22:49.785 amuralid-arnml-1 confd[30321]: devel-cdb Compacted CDB journal file: 3 ms (1924 nodes in memory, disk size 39.07 KiB → 36.09 KiB)

When it doesn’t work:

8-Oct-2018::09:02:49.838 jenkins-A–jenkins–j7-ssh1-3-5–0–2015fcee03 confd[8616]: confd embedded apps in phase0:
8-Oct-2018::09:02:50.459 jenkins-A–jenkins–j7-ssh1-3-5–0–2015fcee03 confd[8616]: devel-cdb init files found in /usr/rift/var/rift/persist.riftware/var/confd/cdb: aaa_init.xml, rw-auth-system.init.xml, rw-project.init.xml, rw-rbac-platform.init.xml, rw-user.init.xml, rwlog-mgmt.init.xml
8-Oct-2018::09:02:50.963 jenkins-A–jenkins–j7-ssh1-3-5–0–2015fcee03 confd[8616]: confd embedded apps in phase1:
8-Oct-2018::09:03:00.960 jenkins-A–jenkins–j7-ssh1-3-5–0–2015fcee03 confd[8616]: devel-cdb Initiating CDB journal compaction
8-Oct-2018::09:03:00.971 jenkins-A–jenkins–j7-ssh1-3-5–0–2015fcee03 confd[8616]: devel-cdb Compacted CDB journal file: 10 ms (1071 nodes in memory, disk size 17.56 KiB → 16.46 KiB)
8-Oct-2018::09:03:09.325 jenkins-A–jenkins–j7-ssh1-3-5–0–2015fcee03 confd[8616]: confd embedded apps in phase2:
8-Oct-2018::09:03:19.843 jenkins-A–jenkins–j7-ssh1-3-5–0–2015fcee03 confd[8616]: devel-cdb Initiating CDB journal compaction
8-Oct-2018::09:03:19.859 jenkins-A–jenkins–j7-ssh1-3-5–0–2015fcee03 confd[8616]: devel-cdb Compacted CDB journal file: 15 ms (1753 nodes in memory, disk size 34.60 KiB → 32.75 KiB)

When it works it says 1924 nodes in memory and when it doesn’t it says 1753 nodes in memory.

Any idea what could be the issue or way to debug it ?

Thanks

No idea, but did you verify aaa_init.xml is still present at loadPath (usually where *.fxs files are located).