Hello all, we are using Confd 7.6.1.
When confd crashes, it reports the following log:
*=CRASH REPORT==== 8-Dec-2035::00:21:49.729590 ===*
* crasher:*
* initial call: confd_ia:'-start_acceptors/1-fun-0-'/0*
* pid: <0.92.0>*
* registered_name: []*
* exception exit: {error,emfile}*
* in function confd_ia:acceptor/5 (confd_ia.erl, line 527)*
* ancestors: [confd_ia,confd_sup,<0.48.0>]*
* message_queue_len: 0*
* messages: []*
* links: []*
* dictionary: []*
* trap_exit: false*
* status: running*
* heap_size: 1598*
* stack_size: 27*
* reductions: 28890379*
* neighbours:*
*"Out of file descriptors for accept() - process limit reached\n"*
This is happening because the callhome requests from the server are always dropped by the netconf client.
The client just accepts the callhome tcp connection but closes it without completing the authentication.
The southbound application application triggers the callhome through maapi_netconf_ssh_call_home api which fails with error : external error (19) - unknown POSIX error.
We noticed that sometimes the sockets on confd side (confd.smp) get closed and sometimes they remained pending (ESTABLISHED) even if they are closed on the client side. This behavior generates a leak of file descriptor on confd side because we frequently retry the callhome (every 10s).
#netstat -tuan | grep 4334
tcp 12 0 192.168.121.158:32837 192.168.121.101:4334 ESTABLISHED
tcp 12 0 192.168.121.158:43897 192.168.121.101:4334 ESTABLISHED
tcp 12 0 192.168.121.158:48109 192.168.121.101:4334 ESTABLISHED
Note that the pending connections always have RECV-Q=12.
We’ve set clientAliveInterval=PT10S and tried clientAliveCountMax=3,1 or 0, but confd is still having fd leak.
Is there any confd configuration we can tune to avoid this leak?
Thanks!