CDB XML DB performance

SCadilhac · January 3, 2022, 11:03am

Hello,

I’m checking the performance of the native XML CDB of ConfD 7.6.

Yang example:

	list line {
		key "key1 key2";

		leaf key1 {
			type uint16;
		}

		leaf key2 {
			type uint16;
		}

		leaf data1 {
			type uint16;
		}

		leaf data2 {
			type inet:ipv6-prefix; 
		}

		leaf data3 {
			type uint32;
		}
	}

The following code (initial configuration read at startup) takes about 50 seconds to read about 50,000 configured list entries:

p = cdb_num_instances(rsock, "/line", i);
confd_values *values = calloc(p * 5, sizeof(confd_value_t));
cdb_get_objects(rsock, values, 5, 0, p, "/line");
...

Is there a more efficient way to retrieve high quantity of data?

Thanks!

SCadilhac · January 4, 2022, 9:11am

After more benchmarking, the reading time is exponential versus the number of entries. So it seems that even cdb_get_objects internally browses all the list from the beginning to find each next item. Thus splitting the list into two-level hierarchy greatly improves the total reading time.

mvf · January 4, 2022, 2:15pm

Good you have found a solution that works for you. Few remarks that might help you or others reading this:

CDB API really may not be optimal for traversing large lists due to how it handles indexes (though it would not explain exponential behavior, that would really be a reason for concern); you might want to have a look at MAAPI instead.
Using bulk methods (such as .._get_objects) improves performance only up to a point, it does not help to try retrieve thousands of entries in one shot; and transfering large amount of data at once might actually degrade performance when you hit memory issues. My rule of thumb says that (lower) hundreds are more than enough.

cohult · January 6, 2022, 1:34pm

@ SCadilhac: I.e. retrieve the data using chunks of say 100 objects / list entries at a time. For your use case, you will then likely notice a 10x improvement in wall clock time to retrieve 50k entries, while the time will be linear versus the number of entries.

SCadilhac · January 6, 2022, 4:45pm

Hi @cohult @mvf
Thanks for your feedback.

Here is an updated test to check the loading time vs the number of entries to read (reading 100 list entries at a time):

    #define CHUNK 100
    for (int max = 5000; max < 50000; max += 5000) {
      uint64_t before = get_time_ms();
      confd_value_t values[CHUNK * 5];
      int i = 0;
      while (i < max) {
        cdb_get_objects(rsock, values, 5, i, CHUNK, "/line");
        i += CHUNK;
      }
      uint64_t after = get_time_ms();
      printf("Loaded first %d entries in %lld ms\n", i, after - before);
    }
    printf("DONE");

Which gives the following output:

Loaded first 5000 entries in 936 ms
Loaded first 10000 entries in 2499 ms
Loaded first 15000 entries in 5067 ms
Loaded first 20000 entries in 7699 ms
Loaded first 25000 entries in 12325 ms
Loaded first 30000 entries in 16217 ms
Loaded first 35000 entries in 23926 ms
Loaded first 40000 entries in 28852 ms
Loaded first 45000 entries in 38569 ms
DONE

Am I missing something?

cohult · January 6, 2022, 7:09pm

Seems like the CDB API is not as smart as MAAPI, which is expected. Try using MAAPI. No chunking required unless you want to save some memory.

...
    confd_value_t *v = (confd_value_t *) malloc(sizeof(confd_value_t) * nobj * 5);
    struct maapi_cursor mc;
    maapi_init_cursor(maapisock, thandle, &mc, "/line");
    maapi_get_objects(&mc, &v[0], 5, &nobj);

SCadilhac · January 12, 2022, 9:35pm

That’s indeed 3-4 times faster using a single maapi_get_objects call. Thanks for the hint @cohult.