Is High Availability Framework(HAFW) a part of confD?

mazhar · February 6, 2019, 11:41am

Hi,

I am a bit confused here. Is HAFW a part of confD or is it some external cluster resource management software?

As I can see in the UG, it states The remaining slaves must also be informed by the HAFW about the new master situation. ConfD will never take any actions regarding master/slave-ness by itself..
Which HAFW is it talking about?

It further states The only thing ConfD does is to replicate the CDB data amongst the members in the HA group. It doesn't perform any of the otherwise High-Availability related tasks such as running election protocols in order to elect a new master.
How do I achieve election protocol stuff for electing a new master?

I have a few more questions here but before asking I would like to give a brief on my system architecture. So I have like two management blades in a chassis-based system which are to be configured over a high availability network. I have tested with configuring the management blades over confD high availability and the CDB is getting synced and also the changes are getting replicated during configuration changes.

Now my questions are:-

How do I achieve election of a new master when the master/active node goes down or when the socket connection is closed?
How do I achieve a single point of access to the confD server for all of my managed object nodes (Floating Virtual IP)?
Do I have to re-initiate the CDB subscription to the new master node when the master/active node goes down?
What impact does this overall high availability switch over have on my managed objects?
At what conditions do I need to make use of HA notifications?
Do I have to use any cluster resource management software (eg. corosync, pacemaker, etc.) on top of confD?
If I do have to use external HAFWs, which HAFW fits well will confD?

Regards.

jjohansson · February 7, 2019, 2:25pm

Short answer, no.

First some background. As you’ve already noted, the UG states that ConfD is not an HA framework. In terms of HA, ConfD provides:

CDB replication from active ConfD instance (master) to standby instance(s) (slave(s))
internal supervision of ConfD instances that comprise a ConfD cluster
API calls to tell a ConfD instance what role to play in a cluster and to subscribe for HA events describing ConfD instances view of the state of the cluster.

An external HA framework is needed to manage HA properties of ConfD. The HA framework is expected to provide functions like - master election, callouts to inform ConfD instances about their role in the cluster, supervision of the ConfD instances that makes up the cluster, tell client applications (MOs) where the active ConfD instance is located, etc…

Answers to questions:

Master election is outside the scope of ConfD, it’s the responsibility of an external HA framework. ConfD is just a daemon running on a node, the HA framework figures out which node should be active, tells the ConfD instance on that node to be master and any ConfD instance(s) on other nodes to be slave(s).
Floating IPs is also outside the scope of ConfD. ConfD just listens to a socket:
a. If you have a floating IP, client applications use the same IP address when connecting to ConfD regardless of where in the cluster the active instance is located.
b. If you don’t have a floating IP the HA framework can inform client application of where in the cluster the active instance is located (it already knows this).
c. A third option is to have ConfD slaves on all nodes (even though only the instances running on the management blades might be masters) and have all client applications connect to the local instance)(localhost/127.0.0.1).
Applications that subscribed to the old master must re-issue their CDB subscriptions, CDB subscriptions are specific to a single ConfD instance.
The CDB connection to the old active instance is closed, MOs has to be informed by the HA framework where the new active instance is located, reconnect and re-establish subscriptions. Once connected, the clients applications might check the CDB transaction id to ensure that no config changes has been made during the failover.
The HA events are intended for the HA framework that manages the cluster. The events inform the HA framework about the state of the cluster as seen by the ConfD instances. Events inform the framework about:
a. A slave lost contact with the master
b. The master lost contact with a slave
c. A slave is initialized (its database is synchronized with the master database and the slave is ready to take over should the master crash)
d. etc
6 and 7. ConfD has been used with commercial HA frameworks, HA frameworks written in-house and open source frameworks. From an HA perspective ConfD is very simple, the HA framework only has to be able to invoke a handfull of API calls (we provide C, Java and Python language bindings) to tell ConfD instances about their HA role and to subscribe to events from ConfD.

In addition to the main HA chapter in the UG (chapter 24), HA events are described in section 12.10 and the API calls are described in the confd_lib)ha(3) man-page.

mazhar · February 8, 2019, 5:10am

Hi @jjohansson,

Thank you so much for taking the time to reply. I was eagerly waiting for a reply. Your explanation is sound, clear and astounding. I totally get it now. However, I have got a few more queries based on your reply.

First, I would prefer going with the floating IP as that seems more promising to me in terms of performance and simplicity.

Now, as you said Once connected, the client's applications might check the CDB transaction id to ensure that no config changes have been made during the failover., How is my client application supposed to keep a track of the transaction id?

As far as I know, an HA node keeps the track of the transaction id. For the slave node that goes down and later reconnects to the cluster will first compare the transaction id of the master to that of its own, if there be any difference the CDB is updated. These things are internally taken care of by the confD server. How are my client applications supposed to keep a track of the transaction id?

And for this one ConfD has been used with commercial HA frameworks, HA frameworks written in-house and open source frameworks., can you suggest me a few commercially available frameworks and open source frameworks which you think would work well with confD? I also want service support with the framework that can manage Floating IP service.

Thanks and Regards.

cohult · February 8, 2019, 7:22am

Regarding transaction id see for example:

See also ConfD UG Chapter.”CDB - The ConfD XML Database”
section “Reconnect“ and cdb_get_txid() cdb_get_replay_txids() cdb_replay_subscriptions() in the confd_lib_cdb(3) man page