Starting confd in kubernetes stuck in phase0

jerronimo · December 23, 2021, 2:25pm

Hello, I am building a active-active high availability cluster on kubernetes.
I’ve created a docker image based on examples from

https://info.tail-f.com/appnote-confd-and-kubernetes
I chose Alpine distro with dumb-init as it seems to fit nicely with it’s tiny footprint.
I’ve setup a load balancer with a cluster of confd instances.

If I start confd in yaml file like this, then it’s all fine, all instances come up with northboud interfaces up and ready. netconf-console can connect and send query.

  containers:
    - name: confd
      image: localhost:5000/confd
      command:
         - "sh"
         - "-c"
         - "/confd/bin/confd --foreground -c /confd/secret/confd.conf;

kubectl exec confd-0 – sh -c “/confd/bin/confd_cmd -c "get_phase"”

phase: 2 flags: 0x0

Now I’m struggling with a phased confd startup, so I can tell each instance which is master and which is slave. Even when I do something simple as below, confd is stuck in phase0 with IPC as a single interface active.

      command:
         - "sh"
         - "-c"
         - "/confd/bin/confd --foreground --start-phase0 -c /confd/secret/confd.conf;
           /confd/bin/confd --start-phase2"

Now, the instances are up and running,but they’re in phase0:
kubectl exec confd-0 – sh -c “/confd/bin/confd_cmd -c "get_phase"”

phase: 0 flags: 0x1 INIT

I can manually instruct it to enter phase2:
kubectl exec confd-0 – sh -c “/confd/bin/confd --start-phase2”
kubectl exec confd-0 – sh -c “/confd/bin/confd_cmd -c "get_phase"”

phase: 2 flags: 0x0

Could it be Alpine that is problematic or kubernetes? Any ideas?

mvf · January 3, 2022, 8:14am

The problem is in that --foreground makes ConfD to stay in the foreground - it is probably what you want to keep the container running, but it also means that the subsequent command with --start-phase2 is never used. Phased confd startup is needed only when you want/need to start some of your ConfD applications in earlier phases, so something simple like confd --start-phase0; confd --start-phase2 does not make much sense anyway.

In your case you need to either just start confd in the foreground, as you are doing now, and start everything else (including your application and --start-phase2) externally; or write a script or a watchdog-like application that starts ConfD phases and your application and stays in the foreground and start your container with that.

jerronimo · January 9, 2022, 11:26pm

@mvf, thank you for your answer, I didn’t notice you answered though.

You are correct. confd --start-phase0; confd --start-phase2 does not make much sense. I just wanted to run confd then do something else ( like tell each node to be master or slave) and then run the rest of northbound interfaces.
I missed the part of command that left a process running like tail -f

now I run something like this:

      "ENV_ORDINAL=${HOSTNAME##*-};
      /confd/bin/confd -c /confd/secret/confd.conf --addloadpath /confd/confd-cdb; 
      if [ ${ENV_ORDINAL} = 0 ]; then  /confd/ctrl master node0 ; 
      else /confd/ctrl slave node$ENV_ORDINAL node0 $(getent hosts  confd-0.confd.default.svc.cluster.local | awk \'{print $1}\') ;
      fi; 
      tail -n 100 -f /confd/var/confd/log/devel.log"]

Now I’m able to run confd then tell it to be master or slave, so far so good.
But then I’d like to run active-active synchronization between nodes, so I could scale it up or down.

Like in this synchronizer app note:

This document states that active-active synchronization is complementary to active-standby setup.
My question is: does this mean I have to enable built-in synchronization with true in confd.conf and use ctrl (like in dummy ha example) to tell each node it’s role.
I thought that if any node can be read or written to, then it should not matter if it’s master or slave.

My question is, Do I still have to tell each node it’s role like in command above using synchronous active-active setup?

mvf · January 10, 2022, 11:03am

When you implement active-active using the mechanisms described in the application note, all data synchronization is performed by your synchronizer application, ConfD is not really aware that it is part of a HA cluster. So no, you are not using ConfD’s HA API.

jerronimo · January 10, 2022, 11:42am

Great, thank you for your help.