ConfD User Community

Confd crashes due to process limit reached

Hello all, we are using Confd 7.6.1.
When confd crashes, it reports the following log:

*=CRASH REPORT==== 8-Dec-2035::00:21:49.729590 ===*
*  crasher:*
*    initial call: confd_ia:'-start_acceptors/1-fun-0-'/0*
*    pid: <0.92.0>*
*    registered_name: []*
*    exception exit: {error,emfile}*
*      in function  confd_ia:acceptor/5 (confd_ia.erl, line 527)*
*    ancestors: [confd_ia,confd_sup,<0.48.0>]*
*    message_queue_len: 0*
*    messages: []*
*    links: []*
*    dictionary: []*
*    trap_exit: false*
*    status: running*
*    heap_size: 1598*
*    stack_size: 27*
*    reductions: 28890379*
*  neighbours:*

*"Out of file descriptors for accept() - process limit reached\n"*

This is happening because the callhome requests from the server are always dropped by the netconf client.
The client just accepts the callhome tcp connection but closes it without completing the authentication.
The southbound application application triggers the callhome through maapi_netconf_ssh_call_home api which fails with error : external error (19) - unknown POSIX error.
We noticed that sometimes the sockets on confd side (confd.smp) get closed and sometimes they remained pending (ESTABLISHED) even if they are closed on the client side. This behavior generates a leak of file descriptor on confd side because we frequently retry the callhome (every 10s).

#netstat -tuan | grep 4334
tcp       12      0 192.168.121.158:32837   192.168.121.101:4334    ESTABLISHED
tcp       12      0 192.168.121.158:43897   192.168.121.101:4334    ESTABLISHED
tcp       12      0 192.168.121.158:48109   192.168.121.101:4334    ESTABLISHED

Note that the pending connections always have RECV-Q=12.

We’ve set clientAliveInterval=PT10S and tried clientAliveCountMax=3,1 or 0, but confd is still having fd leak.

Is there any confd configuration we can tune to avoid this leak?

Thanks!

Is the SSH connection from the client initiated over the accepted TCP connection? Is the SSH connection never authenticated?

If it is authenticated, see also:
/confdConfig/ssh/idleConnectionTimeout (xs:duration) [PT10M]

The client accepts the TCP connection, tries the SSH authentication (without any credentials) but it fails. Then it closes the tcp connection.

At beginning of these operations, the tcp connection is correctly closed on both side (netconf client and confd side) and the netstat command always shows one connection which appears and disappears. But after some amount of time (some minutes) , both the client and ConfD starts leaking file descriptors with connection state ESTABLISHED.

We never authenticate ssh connection but I’ll try the configuration you suggest.

Thanks!