Worker socket query timed out

I am executing an action callback. It is download a big file around 11GB in the action callback. There is already action_set_timeout with 20 minutes. No new extra maapi is created inside the action. But, after 11 or more minutes it errors out and Confd starts again. Error: Internal error. And in other log, it is : Worker socket query timed out
Confd version: 7.3.2

Please suggest.


Try checking the resource utilization on the affected machine → CPU/memory/disk space/swap during callback runtime…

See if any specific process (be it ConfD, app daemon that does the callback implementation or other) has unexpectedly high utilization.

“it errors out and ConfD starts again” sounds a bit vague - check its logs for specifically printed errors or potential details being reported. (not saying that there must be some, but hopefully at least some guide from last items before mentioned restart).

There no abnormality in memory or cpu usage. I checked journal log.
In app logs:
error: Error(‘item does not exist (1): no cli session’,), user_msg: Internal error
EOF: ConfD closed connection

I turned on debugging in confd logs. In another log at same time:
confd[8820]: devel-c Control socket request timed out daemon
confd[8820]: devel-c action action() error {external_timeout, “”} for callpoint ‘image-load’

Timeout is set to 20 mins and it timeouts intermittently before configured timeout. In this case, it timeout in 15 mins. There is no another error just prior to this error. We have single ctrl socket and worker socket in the daemon and there is no multithreading in this case.
We have systemd service which restarts confd if EOF exception is received from confd.

Hmm, what northbound API do you use to invoke the action?
Could it be caused by timeout triggered on the user’s session (e.g. ssh to cli), that subsequently leads to torn down connection in your pending 20min action that is more south in the pipeline?

In first time when daemon starts:
daemon_context = dp.init_daemon(‘isg_cli’)
worker_socket = socket.socket()
control_socket = socket.socket()
maapi_socket = socket.socket()

connect(daemon_context, control_socket, worker_socket, maapi_socket)

register_transactions(daemon_context, worker_socket, maapi_socket)
register_validations(daemon_context, worker_socket, control_socket, maapi_socket)
register_actions(daemon_context, worker_socket, control_socket, maapi_socket)

In image action callback file :
dp.action_set_timeout(uinfo, IMAGE_LOAD_TIMEOUT)
mgr = ImageManager(cli_force, self.maapi_socket, uinfo)

Progress of file download is done using:
maapi.cli_write(self.maapi_socket, uinfo.usid, progress)

I bypassed ssh cli. Used serial port and in cli, image load failed with timeout in 15 minutes.