Hey @Ivo @f.barczyk,
I appreciate your input here.
I agree that you do have to think a bit different from just “database” here. Rather, I believe you need to adjust to thinking transactions:
- Why transactions are an essential part of network automation
- and the steps involved in a transaction to be able to pinpoint CPU wall-clock time performance bottlenecks.
I.e. as a user you can get quite detailed insights into what’s going on in that “Erlang blob” by dicing up ConfD and NSO transactions into their distinct steps. For example the validate phase of the transaction, where the “must” expressions that @Ivo mentioned are processed against the new configuration in the transaction.
Focusing here on the steps in a ConfD transaction to the running datastore (sidenote: NSO transactions to the running datastore are similar), in short they are:
The nice thing with the progress trace, in addition to the big picture from developer log at trace level, is that the progress trace shows the timing for each of the steps taken in the transaction shown in the picture above. Including the time consumed by for example a transaction-hook callback, a validation application callback, or some “must” statements being processed.
The picture below show the transaction progress trace with “normal” level of detail when committing a large list using the examples.confd/intro/python/10-transform example together with a subscriber that just print the config changes:
Next picture describe how NSO (top box) and ConfD (bottom) perform a successful NETCONF “network wide transaction”, i.e. the ConfD candidate datastore, candidate validation, and confirmed commit of the candidate to running capabilities are all enabled. Not the fastest, but the most “robust” way of deploying a service to multiple devices / nodes.
Knowledge about why and how transactions are processed by ConfD together with the developer and progress trace will get you quite far with pinpointing performance bottle-necks in most cases.
The GitHub performance measurement demo that I pointed to above (accompanied by its slide presentation in that GitHub same repository) can also be helpful to further visualize the wall-clock time and memory peak high watermark running different loads.