What is recommended method to profile confd server?

hello,
we are running confd server on an embedded Linux device with dozen or more applications subscribing to and reading and writing into confd DB.

When operator commits a transaction into configuration DB then cpu load rises extensively for the confd process. I think It’s not as much effect of the user’s commit but the postprocessing done by all the client applications.

Or maybe I’m wrong?

Is it possible to somehow measure the load on the confd for each of the client applications to find out what is the reason for the cpu overload?

One idea that comes to my mind is to build a LD_PRELOAD approach with a wrapper library for most of the confd API what would forward the calls and gather profiling data.
But maybe there’s an easier method?

Best regards,
Filip

For profiling a transaction I suggest you use the ConfD Progress Trace.

  • Very useful information for debugging, diagnostics and profiling.
  • When a transaction or action is applied, ConfD emits timestamped progress events.
  • Print in the CLI, log to file, log as operational data in CDB, subscribe to notifications, maapi.reportProgress()
  • Control verbosity – normal, verbose, very verbose, debug
  • Filter on context and/or user
  • See ConfD’s User Guide chapter “Progress Trace” for more details.

I enabled it in some of the GitHub demos. For example in this one:

It occured to me that the LD_PRELOAD approach on the client side is still only that: profiling of the client app not profiling of the confd. I would know which of the client apps issue most of the requests to confd but I would not know the impact of the requests on the confd.

@cohult Thank you for pointing it out, I will check it out.

If you first want to get an overview of what’s going on between your applications and ConfD seen from the ConfD daemon side, in your ConfD configuration file, confd.conf, set the developer log level to TRACE. See the confd.conf(5) man page. I.e:

<logs>
  <enabled>true</enabled>
  <file>
    <enabled>true</enabled>
    <name>./devel.log</name>
  </file>
  <developerLogLevel>trace</developerLogLevel>
</logs>

yes, this I have enabled. We have not specified developerLogLevel thought so I need to try that.

But no success with progress trace:

1/ Added in confd.conf:

 <logs>
       <progressTrace>
              <dir>/var/log/confd_nr/progress</dir>
              <enabled>true</enabled>
        </progressTrace>

2/ restarted confd

3/ reconnected confd_cli:

root@host2:~> confd_cli -uuser
Welcome to the ConfD CLI
user connected from 127.0.0.1 using console on host2
user@jhost2 17:01:00>config
Entering configuration mode private
[ok][2020-03-25 17:01:03]

[edit]
user@host2 17:01:03%unhide debug
Password: ******
[ok][2020-03-25 17:01:16]

[edit]
user@host2 17:01:16%set progress
… ^
% Invalid input detected at ‘^’ marker.

^ marker refers to ‘progress’…

Use the C-style CLI:

Alas, in C-style confd_cli session progress doesn’t show either.

Is it possible it was disabled in YANG model or any other way?

Let’s summarize how to enable the progress trace for future readers and for you to double check:

  1. Add the following to your confd.conf:
<logs>
…
  <progressTrace>
    <enabled>true</enabled>
    <dir>.</dir>
  </progressTrace>
</logs>
<hideGroup>
  <name>debug</name>
</hideGroup>
  1. You can then run a script or similar that does something like this:
$ cat my-bash-script.sh
#!/bin/bash

${CONFD_DIR}/bin/confd_cli -u admin -C << EOF
config
unhide debug
progress trace trans-demo enabled verbosity normal destination file progress.trace format log
commit
EOF

NOTE: If you have the dynamic configuration feature enabled, your confd.conf settings may have been overridden at startup.

To check if the dyncfg is enabled check your confd.conf “runtimeReconfiguration” setting or do something like this:

$ confd --status |grep -i dyn
            http://tail-f.com/ns/confd_dyncfg/1.0 prefix:confd_dyncfg exported to: all
  module: confd_dyncfg revision: 2020-02-07
    namespace: http://tail-f.com/ns/confd_dyncfg/1.0
    prefix: confd_dyncfg

If enabled, check your progress trace settings:

$ confd_load -dd -Fp -p /confdConfig/logs/progressTrace
TRACE Connected (maapi) to ConfD
starting user session ctxt=system user=system groups=[system]
TRACE MAAPI_START_USER_SESSION  --> CONFD_OK
TRACE MAAPI_START_TRANS  --> CONFD_OK
TRACE MAAPI_SAVE_CONFIG  --> CONFD_OK
TRACE Connected (stream) to ConfD
<config xmlns="http://tail-f.com/ns/config/1.0">
  <confdConfig xmlns="http://tail-f.com/ns/confd_dyncfg/1.0">
  <logs>
  <progressTrace>
    <enabled>true</enabled>
    <dir>./</dir>
  </progressTrace>
  </logs>
  </confdConfig>
TRACE MAAPI_SAVE_CONFIG_RESULT  --> CONFD_OK
</config>
TRACE MAAPI_END_USER_SESSION  --> CONFD_OK

To reload the progress trace config:

$ cat progress_trace_init.xml 
<config xmlns="http://tail-f.com/ns/config/1.0">
  <confdConfig xmlns="http://tail-f.com/ns/confd_dyncfg/1.0">
    <logs>
      <progressTrace>
        <enabled>true</enabled>
        <dir>./</dir>
      </progressTrace>
    </logs>
    <hideGroup>
      <name>debug</name>
    </hideGroup>
  </confdConfig>
</config>

$ confd_load -dd -m -l progress_trace_init.xml 
TRACE Connected (maapi) to ConfD
starting user session ctxt=system user=system groups=[system]
TRACE MAAPI_START_USER_SESSION  --> CONFD_OK
TRACE MAAPI_START_TRANS  --> CONFD_OK
TRACE MAAPI_LOAD_CONFIG_FILE  --> CONFD_OK
TRACE MAAPI_APPLY_TRANS  --> CONFD_OK
TRACE MAAPI_END_USER_SESSION  --> CONFD_OK

Except buildin logs as @cohult mentioned I’d suggest to check the R/W performance of CDB itself.

As long time NSO/ConfD user I have to say you need to adjust your mindset to NSO (or try different product i.e. sysrepo/netopeer2, but not sure if it will be any better). It is not behaving like traditional databases and what in someSQL would be action of linear speed might be exponential in NSO.

First thing - the more complicated models are used, the lower R/W performance is and the more CPU is consumed. Every ‘must’ etc. Have impact on speed of R/W because ConfD/CDB must validate entries.

Second thing - despite the fact that ConfD core is written in Erlang it’s not the fastest and cpu utilization optimal when it comes to parallel R/W. There was even bug reported by us (for NSO) which caused parallel read to be magnitude slower than sequential, but this was fixed in NSO so I hope they fixed in ConfD too already.

Third thing - Erlang use it’s own ‘in-process’ threading so big part of the workload will internally be many erlang-threads while externally will be one single process on one single core. Once one core is fully utilized it becomes the bottleneck. That (erlang core) btw. means you’ll never get any more details why something is slow/CPU intensive inside NSO except the logs which are already exposed.

I know that this is not directly answering your question, but it’s more hint how to approach testing of ConfD performance.

Take pure ConfD, load the models, run R/W tests in sequence and in (massive) parallel, collect duration/CPU data, plot it, add the trendlines, take result as ‘ConfD behavior model’ (which btw. changes from version to version, usually towards better), either ‘get used to/count with it’ or change your plan.

1 Like

Hey @Ivo @f.barczyk,
I appreciate your input here.

I agree that you do have to think a bit different from just “database” here. Rather, I believe you need to adjust to thinking transactions:

  • Why transactions are an essential part of network automation
  • and the steps involved in a transaction to be able to pinpoint CPU wall-clock time performance bottlenecks.

I.e. as a user you can get quite detailed insights into what’s going on in that “Erlang blob” by dicing up ConfD and NSO transactions into their distinct steps. For example the validate phase of the transaction, where the “must” expressions that @Ivo mentioned are processed against the new configuration in the transaction.

Focusing here on the steps in a ConfD transaction to the running datastore (sidenote: NSO transactions to the running datastore are similar), in short they are:

The nice thing with the progress trace, in addition to the big picture from developer log at trace level, is that the progress trace shows the timing for each of the steps taken in the transaction shown in the picture above. Including the time consumed by for example a transaction-hook callback, a validation application callback, or some “must” statements being processed.

The picture below show the transaction progress trace with “normal” level of detail when committing a large list using the examples.confd/intro/python/10-transform example together with a subscriber that just print the config changes:

Next picture describe how NSO (top box) and ConfD (bottom) perform a successful NETCONF “network wide transaction”, i.e. the ConfD candidate datastore, candidate validation, and confirmed commit of the candidate to running capabilities are all enabled. Not the fastest, but the most “robust” way of deploying a service to multiple devices / nodes.

Knowledge about why and how transactions are processed by ConfD together with the developer and progress trace will get you quite far with pinpointing performance bottle-necks in most cases.
The GitHub performance measurement demo that I pointed to above (accompanied by its slide presentation in that GitHub same repository) can also be helpful to further visualize the wall-clock time and memory peak high watermark running different loads.

Fully agree, the first thing is to get progress trace to understand the steps in transactions.

The ‘blobish’ thing I referred to is that once you’re on trace level you’d see ‘phase start…’ / ‘phase end’ message, but you have no details about why duration of phase is so long/short and why CPU goes to the skies during that phase.

Then you really need the behavioral (black box) testing to understand impact on your service/application.

I.e. ‘validate’ is nice example where changing yang model to have less ‘must’ statements and solving constrains in code might have huge impact on performance.

“must” statements with their XPaths expressions are indeed a quite common cause of performance issues. So it is worth noting that after the XPath trace was introduced a year or two ago, users can get the details of what kind of processing those XPath expressions will result in when processed by the ConfD/NSO XPath engine. See the confd.conf(5) or ncs.conf(5) man pages for details.
So at least when it comes to the validation phase, you now have some detailed insight.

@cohult Dear Conny, no /progress nor /confdConfig are in my model :frowning:

confd_load -dd -Fp -p /confdConfig/logs/progressTrace
TRACE Connected (maapi) to ConfD
starting user session ctxt=system user=system groups=[system]
TRACE MAAPI_START_USER_SESSION → CONFD_OK
TRACE MAAPI_START_TRANS → CONFD_OK
TRACE MAAPI_SAVE_CONFIG DEBUG badly formatted or nonexistent path - Bad path element “confdConfig” after: /
→ CONFD_ERR
confd_load: 719: maapi_save_config() failed: badly formatted or nonexistent path (8): Bad path element “confdConfig” after: /

Where I go from here?

What ConfD version are you using?

confd --version
7.1.1.1

Stragely enough my list of commands available in C-stype cli doesn’t include “set” or “request”…

0 root@dev#-> confd_cli -C -uadmin
Welcome to the ConfD CLI
admin connected from 127.0.0.1 using console on dev
dev#config
Entering configuration mode terminal
dev(config)#unhide debug
Password: ******
dev(config)#set progress
--------------^
syntax error: unknown command
dev(config)#request
^
% Invalid input detected at ‘^’ marker.
dev(config)#request

Is the progress trace fxs file loaded?

$ confd --status | grep tailf-progress
  module: tailf-progress revision: 2018-06-14
$ pwd
/Users/tailf/confd-7.1.1
$ find . -name "tailf-progress*"
./etc/confd/tailf-progress.fxs
./src/confd/yang/tailf-progress.yang

1 root@dev#-> confd --status | grep tailf-progress
module: tailf-progress revision: 2018-06-14
/
0 root@dev#-> find / -name ‘tailf-progress*’
/mnt/storage/CONFD/…/confd/tailf-progress.fxs

yes, the fxs file is at the same location where our own fxs files are. And bunch of other tailf and ietf fxses.

Try configuring the progress trace over MAAPI then:

# First some sanity check
$ confd --status |grep "running modules"
running modules: backplane,netconf,cdb,cli,webui

$ confd --status |grep -A3 "module: tailf-progress"
  module: tailf-progress revision: 2018-06-14
    namespace: http://tail-f.com/ns/progress
    prefix: progress
    exported to: all

# create a file like this one:
$ cat enable-progress-trace.xml
<config xmlns="http://tail-f.com/ns/config/1.0">
  <progress xmlns="http://tail-f.com/ns/progress">
    <trace>
      <name>trans-demo</name>
      <destination>
        <file>progress.trace</file>
        <format>log</format>
      </destination>
      <enabled>true</enabled>
      <verbosity>normal</verbosity>
    </trace>
  </progress>
</config>

# Load it using MAAPI (e.g. using the confd_load tool)
$ confd_load -dd -m -l enable-progress-trace.xml
TRACE Connected (maapi) to ConfD
starting user session ctxt=system user=system groups=[system]
TRACE MAAPI_START_USER_SESSION  --> CONFD_OK
TRACE MAAPI_START_TRANS  --> CONFD_OK
TRACE MAAPI_LOAD_CONFIG_FILE  --> CONFD_OK
TRACE MAAPI_APPLY_TRANS  --> CONFD_OK
TRACE MAAPI_END_USER_SESSION  --> CONFD_OK

Done. Hope that helps