Multiple confD instances running standalone shared CDB

smeadows · March 14, 2021, 3:32am

We are thinking of having multiple Confd instances running in Kubernetes environment, each instance will be standalone with a shared storageOS ext4 file-system housing CDB and other files. Before going down that road I wanted to run this by you guys. In front of these instances will be load-balancer distributing traffic round-robin to each instance.

cohult · March 14, 2021, 5:01pm

If I understand the strategy you are considering, I’d say a share file system where a number of ConfD instances share the same CDB persistent file will not work.

ConfD CDB is an in-memory database that persists the transactional changes to the database so that in the event of a restart, that persisted CDB data can be read into memory again.
If one instance changes the CDB persistent data file, the other ConfD CDB instances will be unaware of the change until a transaction changes something in their in-memory database, and they will then attempt to persist those changes only to discover that the persisted data file has changed which will be treated as a persisted data file corrupt error.

A setup that will work better is to synchronize on transactions similar to an HA active-standby setup, but here with multiple active nodes synching with each other in an active-active setup.

A “basic” active-active option that we can (must) evolve from, would be to with little/no gain compared to having just one ConfD instance, set up a cluster with ConfD instances that keep full 100% percent consistency across all instances at all time. Again, such a cluster of ConfD instances will not be an improvement over one ConD instance as availability will suffer as a successful lock on the other instances in the cluster is needed so that all instances can be updated with the changes from one transaction coming from one instance in the cluster before the lock can be released.

We need to sacrifice some consistency to make the instances more available.

One such improved strategy would be to go for “eventual consistency” by, for example, for each ConfD instance use a prepare phase subscriber to pass each sync transaction to a sync transaction queue that each ConfD instance own which will then, if there is no transaction ongoing with that ConfD instance and the queue is empty, immediately commit that sync transaction to the ConfD instance.
If the ConfD instance is busy with another transaction, wait until the transaction lock is released, and start the transaction and report back to the ConfD instance that issued the sync transaction when the transaction succeeded or failed (for example due to. a validation phase or prepare phase error, lost connection, etc.)

There are more details of course. For example:

To find out that the transaction lock is released from an ongoing transaction, a callback or a subscriber that has the lowest priority of all possible subscribers can be used.
The ConfD instances that are synching a transaction to other instances go ahead and complete the transaction before getting an ok back from the instance that the transaction was synched to. It will have to be able to rollback transactions committed after that synched transaction if it fail for some reason.
You can use MAAPI to synch with the other ConfD instances or NETCONF (better security using SSH and the Tail-f proprietary :transaction capability can be useful if running is writable, i.e. running is not just writable through the candidate). The JSON-RPC or RESTCONF (less control compared to the other interfaces) may also be an option.
There is an application note on the “basic” active-active option with a demo. The “Edit Through a NETCONF Client” sequence diagram on page 10 will likely be helpful to evolve from into an “eventual consistency” setup. See https://www.tail-f.com/application-note-confd-active-active-fully-synchronous-ha-clusters/
etc.

smeadows · March 16, 2021, 11:01pm

Thank you for the prompt response. I have looked at the basic active-active option and will be taking that direction and evolve from there.

Any input or additonal examples would be useful.

Thanks,
Steve

smeadows · March 17, 2021, 12:10am

Do you have the code presented in the https://www.tail-f.com/application-note-confd-active-active-fully-synchronous-ha-clusters/ in archive (zip, tar, etc) format and compilable?

smeadows · March 17, 2021, 5:30am

The example code presented https://www.tail-f.com/application-note-confd-active-active-fully-synchronous-ha-clusters/ establishes a connection with single CDB instance and replicate from there? This appears similar to what active-standy is doing?

cohult · March 17, 2021, 3:09pm

See https://github.com/ConfD-Developer/ConfD-Demos under “active-active”

smeadows · April 19, 2021, 4:48pm

I implemented active-active replication into our environment and it is working without issue. Now I am working on phased startup of ConfD and running into issue with initializing a ConfD node from another with data. I initially used maap_get_objects and found it only works with list and leaf nodes, so I tried maapi_get_object and it appears to NOT do as want. I need to be able to copy the contents of A.cdb from initializing node into a phased confd startup node.

Is there a way of doing this programmically with ConfD API?

cohult · April 19, 2021, 10:28pm

You can use for example maapi_save_config() + maapi_save_config_result() and maapi_load_config_cmds() to do the job of copying some or all content between ConfD nodes.

smeadows · April 19, 2021, 10:45pm

Taking a look at API calls now.

Thx!

smeadows · April 19, 2021, 11:36pm

After read the documentation, what exactly does the function maapi_load_config_cmds do?

It appears there is no way of obtaining the running configuration from another ConfD node? Your either load the configuration from XML file or construct XML manually?

cohult · April 19, 2021, 11:41pm

I probably should have pointed you to maapi_load_config_stream() + maapi_load_config_stream_result() instead.
maapi_load_config_cmds() load config from a string (memory) instead of from a file.

smeadows · April 20, 2021, 5:10am

Receiving the message: “=ERROR REPORT==== 20-Apr-2021::05:55:39.258557 ===
Got unknown id 4 on stream socket {state,[],2}” directly after call to maapi_load_config_stream and confd_stream_connect . The id is correct and calls are more than within 10 sec windows stated in documentation, but ConfD reports the id is unknown? Can this be accomplish during phased startup?

smeadows · April 22, 2021, 2:08am

I now have configuration XML being recorded from initialization CDB to phased startup CDB. Can’t help but notice their are some differences. I’m thinking maybe the flags, what are the flag that I should using? I don’t want go down series of try and the correct flag sessions to find the correct flag setting for maapi_load_config_stream(), and maapi_save_config().

Do you have any idea?

cohult · April 22, 2021, 8:04am

I am assuming you want to copy over config only.

For saving I would use:
MAAPI_CONFIG_XML (fastest)

For loading data:
MAAPI_CONFIG_MERGE (unless you want to delete the config on the target first)
MAAPI_CONFIG_XML

smeadows · April 23, 2021, 5:13am

Everything is working now. Thank you!

navjot · May 9, 2024, 5:03pm

How to handle this case in Active Standby where data can be lost. Appreciate if you have some ideas because I read these threads and the UserGuide, tried few things but it did not work.

I am able to reproduce this use-case and the config change does get lost.
The use-case is:
P is primary and S1,S2 are secondaries
Let’s say connection between P and S1 is broken and something is committed in P (tid=t1)
Now, the replication mode by default is ‘sync’, but since S1 connection is broken so P would not know about S1. So P would not have wait for syncing the commit with S1.
So end result is S1 is out-of-sync (does not have t1) but S2 is in-sync (has t1)
Now, P dies and S1 becomes Primary so it will replicate to other nodes and the t1 is lost(config is lost)

So is there a way where I can determine that S1 is NOT a good candidate to be declared Primary by the HAFW since it’s out-of-sync but how to determine that?

cohult · May 14, 2024, 10:44am

The sometimes tricky part of determining which node should be primary, perhaps by comparing transaction IDs, is up to your HA framework. Once you decide which node is the primary, ConfD’s HA will sync the nodes. See the link to the post below and the $CONFD_DIR/examples.confd/ha/dummy example.

navjot · May 14, 2024, 11:33am

Yeah, after I posted, I managed to find a solution which was comparing the txIds to figure out the O-O-S instance and not letting it participate in the election process.