Improving performance of notification streams

doronbl · February 13, 2020, 10:15pm

Hi Community,

I have a question regarding notification streams. We are using single notification stream to which many clients are subscribed. We have noticed some wired behavior that suggest multiple subscribers may influence each other. It seems that when we are sending notifications, all the subscribers handle the notifications at the same time. What I mean by that is that we know we have some ‘slow’ and ‘fast’ notification handlers and it seems the notification is handled in a synchronous way by all the stream subscribers.

As far as I understand, ConfD sends notifications asynchronously to each subscriber so it should not be influenced by the performance of ‘slow’ handlers.

Furthermore, it seems that handling a large batch of notifications ~100K takes longer than expected.

I have few questions:

What might explain the ‘synchronous’ behavior we are observing?
What can I do/configure in order to improve performance of single stream to many subscribers?
What is exactly the flow of sending a notification to the subscriber:
a. Does ConfD signal the subscriber which in turn pull the notification? If yes, how can each subscriber read his own notification, as some subscribers might be lagging behind to others?
b. Does ConfD push the notification to the subscriber? Is it done asynchronously?
c. How ConfD manages the stream Q for all the subscribers?
d. Is there any performance difference between establish-subscriptionto create-subscription when sending notifications to many subscribers?

Doron

per · February 14, 2020, 12:03pm

If ConfD were to send “as fast as possible” to the fastest subscriber, it would have to queue notifications for the slower subscribers internally, and that queue could grow without bounds. Instead it sends the notifications “one at a time”, i.e. it doesn’t pick up a new notification from the application socket until the previous one has been sent to all subscribers, effectively pushing the queue to the socket buffers and eventually to the notification-sending application, which will block in confd_notification_send() / confd_notification_send_path() if it continuously sends notifications faster than the slowest subscriber reads them.

I.e. the throughput will indeed be limited by the slowest subscriber, as you observed. To handle the pathological case of a subscriber that remains connected but doesn’t actually read its notifications, the confd.conf parameter /confdConfig/netconf/writeTimeout should be useful.

As the question is stated, not much I think.

There is no “signalling” as such - the connections are flow-controlled by the standard SSH/TCP mechanisms, meaning that a sender can’t send unless the receiver reads - and the reciever can read whenever there is something available.

Yes, and it is “asynchronous” in so far as there is no handshake or the like at the NETCONF application layer. (But of course the SSH/TCP flow control has “handshaking” of sorts, based on available space in the receiver’s buffers.)

Per above, there is no queue.

I’m not really familiar with the new establish-subscription, but I believe the notification sending uses the same mechanism.

doronbl · February 14, 2020, 12:53pm

10x, very clear answers to all questions.

doronbl · February 14, 2020, 3:35pm

Hi Per,

After some thoughts I realized I have few more questions resulting from your answers:

I assume that the the synchronous behavior is per stream, meaning if my clients will use different streams I will be able to stream them in parallel. Is that correct?
Since the stream is not buffered on the server side, how exactly replay is implemented? From where the notifications are taken (when working with CDB as default operational store and no handlers registered to fetch the data)?

10x,
Doron

per · February 14, 2020, 3:59pm

I think it is more correct to say that the behavior (which I wouldn’t call “synchronous”) is per application worker socket rather than per stream. I.e. you can send notifications for multiple streams on one socket, but they will still be sent “one at a time”. With multiple sockets, you can have one “notification sending” per socket in progress - but of course the actual sending of two notifications in a given NB NETCONF session can never happen in parallel. In practice I don’t think you would see a significant throughput increase from either method - and multiple sockets may result in notifications arriving to subscribers out of order, which may or may not be a problem.

See e.g. /confdConfig/notifications/eventStreams/stream/replaySupport and the /confdConfig/notifications/eventStreams/stream/builtinReplayStore section in the confd.conf(5) man page.

per · February 14, 2020, 5:16pm

Well, thinking about this a bit more - with one subscriber per stream, and one stream per worker socket, there should be no dependency between the subscribers, and you should be able to send “as fast as possible” to the fastest subscriber. This comes at the cost of sending the same notifications from application to ConfD multiple times (once per subscriber), and still leaves the question of how you will deal with the subscribers that just can’t keep up with your sending.

per · February 14, 2020, 8:49pm

And despite that I’m reprimanded by the forum for posting multiple replies, I feel compelled to address this point again…

Even in the simple “single socket, single stream, multiple subscribers” case, the notifications are sent in parallel - i.e. any given notification is sent in parallel to all subscribers. If this isn’t obvious from my earlier replies, perhaps you need to re-read them, or ask for clarification. This works just fine for “bursts” of notifications, where the fast subscribers can receive notifications “as fast as possible”.

Only when you continuously send notifications at a rate that the slow subscribers can’t handle will the SSH/TCP flow control kick in for them, and limit the rate at which notifications can be sent. But even then, each notification is sent is sent in parallel to all subscribers. Solving this “rate limiting” problem is hard - I can’t really think of a solution other than a) “somehow” requiring that subscribers can actually handle the steady-state rate of notifications, or b) providing “lite” versions of the streams for the slow subscribers, where fewer notifications are sent.