If I understand the strategy you are considering, I’d say a share file system where a number of ConfD instances share the same CDB persistent file will not work.
ConfD CDB is an in-memory database that persists the transactional changes to the database so that in the event of a restart, that persisted CDB data can be read into memory again.
If one instance changes the CDB persistent data file, the other ConfD CDB instances will be unaware of the change until a transaction changes something in their in-memory database, and they will then attempt to persist those changes only to discover that the persisted data file has changed which will be treated as a persisted data file corrupt error.
A setup that will work better is to synchronize on transactions similar to an HA active-standby setup, but here with multiple active nodes synching with each other in an active-active setup.
A “basic” active-active option that we can (must) evolve from, would be to with little/no gain compared to having just one ConfD instance, set up a cluster with ConfD instances that keep full 100% percent consistency across all instances at all time. Again, such a cluster of ConfD instances will not be an improvement over one ConD instance as availability will suffer as a successful lock on the other instances in the cluster is needed so that all instances can be updated with the changes from one transaction coming from one instance in the cluster before the lock can be released.
We need to sacrifice some consistency to make the instances more available.
One such improved strategy would be to go for “eventual consistency” by, for example, for each ConfD instance use a prepare phase subscriber to pass each sync transaction to a sync transaction queue that each ConfD instance own which will then, if there is no transaction ongoing with that ConfD instance and the queue is empty, immediately commit that sync transaction to the ConfD instance.
If the ConfD instance is busy with another transaction, wait until the transaction lock is released, and start the transaction and report back to the ConfD instance that issued the sync transaction when the transaction succeeded or failed (for example due to. a validation phase or prepare phase error, lost connection, etc.)
There are more details of course. For example:
- To find out that the transaction lock is released from an ongoing transaction, a callback or a subscriber that has the lowest priority of all possible subscribers can be used.
- The ConfD instances that are synching a transaction to other instances go ahead and complete the transaction before getting an ok back from the instance that the transaction was synched to. It will have to be able to rollback transactions committed after that synched transaction if it fail for some reason.
- You can use MAAPI to synch with the other ConfD instances or NETCONF (better security using SSH and the Tail-f proprietary :transaction capability can be useful if running is writable, i.e. running is not just writable through the candidate). The JSON-RPC or RESTCONF (less control compared to the other interfaces) may also be an option.
- There is an application note on the “basic” active-active option with a demo. The “Edit Through a NETCONF Client” sequence diagram on page 10 will likely be helpful to evolve from into an “eventual consistency” setup. See https://www.tail-f.com/application-note-confd-active-active-fully-synchronous-ha-clusters/