Hi tail-f support,
We met an issue that the control socket between ConfdD and its client closed with unknown reason. The error code is -2.
Here is the log of ConfD:
217553 <CRIT> 29-Sep-2021::09:02:14.424 welktx08vzwcsmfaat-y-ec-x-002 confd[201]: - Daemon cpa timed out
217554 <INFO> 29-Sep-2021::09:02:15.181 welktx08vzwcsmfaat-y-ec-x-002 confd[201]: audit user: yp_internal_user/3824 terminated session (reason: normal)
217555 <INFO> 29-Sep-2021::09:02:16.541 welktx08vzwcsmfaat-y-ec-x-002 confd[201]: audit user: smf_admin/3826 assigned to groups: application-administrator
217556 <INFO> 29-Sep-2021::09:02:16.543 welktx08vzwcsmfaat-y-ec-x-002 confd[201]: audit user: smf_admin/3826 created new session via cli from 72.17.0.74:51917 with ssh
217557 <INFO> 29-Sep-2021::09:02:16.563 welktx08vzwcsmfaat-y-ec-x-002 confd[201]: audit user: smf_admin/3826 CLI 'autowizard false'
217558 <INFO> 29-Sep-2021::09:02:16.565 welktx08vzwcsmfaat-y-ec-x-002 confd[201]: audit user: smf_admin/3826 CLI done
217559 <INFO> 29-Sep-2021::09:02:16.614 welktx08vzwcsmfaat-y-ec-x-002 confd[201]: audit user: smf_admin/3826 CLI 'screen-width 512'
217560 <INFO> 29-Sep-2021::09:02:16.615 welktx08vzwcsmfaat-y-ec-x-002 confd[201]: audit user: smf_admin/3826 CLI done
217561 <INFO> 29-Sep-2021::09:02:16.661 welktx08vzwcsmfaat-y-ec-x-002 confd[201]: audit user: smf_admin/3826 CLI 'screen-length 0'
217562 <INFO> 29-Sep-2021::09:02:16.662 welktx08vzwcsmfaat-y-ec-x-002 confd[201]: audit user: smf_admin/3826 CLI done
217563 <INFO> 29-Sep-2021::09:02:16.713 welktx08vzwcsmfaat-y-ec-x-002 confd[201]: audit user: smf_admin/3826 CLI 'config'
217564 <INFO> 29-Sep-2021::09:02:16.718 welktx08vzwcsmfaat-y-ec-x-002 confd[201]: audit user: smf_admin/3826 CLI done
217565 <CRIT> 29-Sep-2021::09:02:16.878 welktx08vzwcsmfaat-y-ec-x-002 confd[201]: - Daemon cpaValidator died
217566 <CRIT> 29-Sep-2021::09:02:16.879 welktx08vzwcsmfaat-y-ec-x-002 confd[201]: - Daemon cpa died
217567 <CRIT> 29-Sep-2021::09:02:16.882 welktx08vzwcsmfaat-y-ec-x-002 confd[201]: - no registration found for callpoint cpa/get_elem of type=external
217568 <CRIT> 29-Sep-2021::09:02:16.883 welktx08vzwcsmfaat-y-ec-x-002 confd[201]: - no registration found for callpoint cpa/get_elem of type=external
217569 <CRIT> 29-Sep-2021::09:02:16.883 welktx08vzwcsmfaat-y-ec-x-002 confd[201]: - no registration found for callpoint cpa/exists of type=external
217570 <INFO> 29-Sep-2021::09:02:16.885 welktx08vzwcsmfaat-y-ec-x-002 confd[201]: audit user: smf_admin/3826 CLI 'cc-rule-handling action Keep-policys remove'
217571 <CRIT> 29-Sep-2021::09:02:16.886 welktx08vzwcsmfaat-y-ec-x-002 confd[201]: - no registration found for callpoint cpa/get_elem of type=external
217572 <CRIT> 29-Sep-2021::09:02:16.887 welktx08vzwcsmfaat-y-ec-x-002 confd[201]: - no registration found for callpoint cpa/get_elem of type=external
217573 <CRIT> 29-Sep-2021::09:02:16.887 welktx08vzwcsmfaat-y-ec-x-002 confd[201]: - no registration found for callpoint cpa/exists of type=external
And here is the log of client:
19752 {"message":"Socket closed","ret":-2,"service_id":"confd-client","severity":"error","timestamp":"2021-09-29T09:02:14.925Z","version":"0.2.0"}
19753 {"db":"running","fd":31,"message":"Socket closed","ret":-2,"service_id":"confd-client","session_id":3804,"session_type":"cli","severity":"error","timestamp":"2021-09-29T09:02:14.925Z","tx":52662,"tx_mode":"READ_WRITE","version":"0.2.0"}
19754 {"func":"CheckExecutePhasedConfdStartup","message":"Going to start phase 1","service_id":"confd-client","severity":"info","timestamp":"2021-09-29T09:02:14.952Z","version":"0.2.0"}
19755 {"error":"failed to go to start phase 1","func":"CheckExecutePhasedConfdStartup","message":"go to start phase 1 failed, may already be in startphase 2","service_id":"confd-client","severity":"info","timestamp":"2021-09-29T09:02:14.953Z","version":"0.2.0"}
19756 {"func":"CheckExecutePhasedConfdStartup","message":"Clearing NACM cache","service_id":"confd-client","severity":"info","timestamp":"2021-09-29T09:02:14.953Z","version":"0.2.0"}
19757 {"func":"CheckExecutePhasedConfdStartup","message":"Going to start phase 2","service_id":"confd-client","severity":"info","timestamp":"2021-09-29T09:02:15.181Z","version":"0.2.0"}
19758 {"func":"CheckExecutePhasedConfdStartup","message":"Going to start phase 2 done","service_id":"confd-client","severity":"info","timestamp":"2021-09-29T09:02:15.181Z","version":"0.2.0"}
19759 {"category":"CM","facility":"log audit","message":"Session start","service_id":"confd-client","session_id":3826,"session_type":"cli","severity":"info","subject":"smf_admin","timestamp":"2021-09-29T09:02:16.542Z","version":"0.2.0"}
19760 INTERNAL ERROR: No trans found with tid=53216
19761
19762 Error on control socket request: internal error (18): No trans found with tid=53216
19763
19764 Ended with exit code 1.
When ConfD client starts, there are several steps below:
- init ConfD library by calling confd_init
- init daemon by calling confd_init_daemon
- load schemas by calling confd_load_schemas
- create control socket which connects to ConfD
- register callbacks
- execute ConfD client phased startup
- client polls the control socket between ConfD
ConfD said ConfD client daemon cpa timed out(why?), then sent CONFD_EOF to client through control socket. Client received it. The handling is
- close the control socket
- release the daemon
- start from step 2 above.
The handling process seems ok. You can in line 19759, a new session was started. But just after that an INTERNAL ERROR came and it caused the client restarted.
My question is why the internal error “no trans found with tid=53216” show? Is there any missing steps in handling control socket problem?
Thank you in advance.
BRs
Michael