Erlang ERROR REPORT "case_clause" when committing a in-service upgrade

Hi,
We encountered an Confd error during the commit step of in-service upgrade as below. Would you please give us some guide what is the root cause of these error?
According to confd UG 15.5. Committing the Upgrade:

When maapi_perform_upgrade() has completed successfully, we must callmaapi_commit_upgrade() to tell ConfD to make the upgrade permanent.

maapi_commit_upgrade() may fail if the upgraded data does not pass validation, and the errors
returned in this case are the same as for e.g. maapi_apply_trans().

Is the error caused by the schema failed to pass the data validation?
Error in confd server:

<INFO> 18-Mar-2021::12:07:46.310 com-cm-yang-provider-659f4fdf8d-8phnb confd[152]: - Upgrade performed
=ERROR REPORT==== 18-Mar-2021::12:07:46.562961 ===
** Generic server confd_upgrade terminating
** Last message in was commit_upgrade
** When Server state == {state,upgrade_done,
                               [ha_server,confd_maapi,
                                confd_mmap_schema_server,confd_yang_lib,
                                confd_schema_mount,aaa_server,cs_call_cache,
                                netconf_server,cli_server,cdb_upgrade,
                                pre_kicker,kicker_server],
                               64,<0.1327.0>,undefined,[],[],kill}
** Reason for termination ==
** {{case_clause,
        {cdb_init_sess,upgrade,false,undefined,64,<0.1334.0>,undefined,true,
            [],
            {[{fxs_header,fxs_header,'http://tail-f.com/yang/xsd-types',
                  'http://tail-f.com/yang/xsd-types',cs,all,"xs",1220383398,
                  [],
                  <<238,143,184,179,193,63,237,249,211,145,225,7,184,178,29,52>>,
                  278,undefined,undefined,[],
                  {yang_header,'1',<<"2017-11-20">>,[],[],
                      <<"tailf-xsd-types">>,0,[]},
                  [],
                  {0,0},
                  <<118,179,168,13,211,172,220,117,147,0,244,27,4,230,154,251>>,
                  [],
                  [{<<1>>,<<0,0,5,93>>}],
                  [],'http://tail-f.com/yang/xsd-types',
                  [{{'http://tail-f.com/yang/xsd-types','tailf-xsd-types',
                        <<"2017-11-20">>},
                    #{'http://tail-f.com/yang/xsd-types' => "xs"}}]},
              {fxs_header,fxs_header,'http://tail-f.com/ns/webui',



... ...

                         [],
                         [<<"/opt/confd/etc/confd">>,<<"/opt/adp/fxs">>,
                          <<"/models-from-db/fxs">>],
                         undefined,
                         {5,"7.4.1"},
                         false,undefined}
      in function  cdb_upgrade:abort_upgrade/0 (cdb_upgrade.erl, line 1120)
      in call from lists:foreach/2 (lists.erl, line 1338)
      in call from confd_upgrade:terminate/2 (confd_upgrade.erl, line 355)
      in call from gen_server:try_terminate/3 (gen_server.erl, line 689)
      in call from gen_server:terminate/10 (gen_server.erl, line 874)
      in call from gen_server:handle_msg/6 (gen_server.erl, line 720)
    ancestors: [<0.1327.0>,maapi,confd_second_sup,confd_sup,<0.48.0>]
    message_queue_len: 2
    messages: [{'EXIT',<0.1339.0>,normal},{'EXIT',<0.1334.0>,normal}]
    links: []
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 318187
    stack_size: 27
    reductions: 719784
  neighbours:


=ERROR REPORT==== 18-Mar-2021::12:07:47.879890 ===
** Generic server kicker_server terminating
** Last message in was init_upgrade
** When Server state == {state,
                            {xds_ram,
                                {otts,#Ref<0.556970867.3619553281.98637>,0,
                                    #Ref<0.556970867.3619422209.98638>},
                                140132777671372,0,[],[],[],[],140132777671372,
                                undefined,undefined,undefined},
                            {pre,{undefined,0}},
                            [nonode@nohost],
                            [{data_kicker,undefined,
                                 [[['http://tail-f.com/ns/kicker'|kickers]]],
                                 [[[1780965560|1612823674]]],
                                 undefined,[],undefined,undefined,
                                 {kicker_server,kicker_data_changed,[]},
                                 [],undefined,undefined,once_for_each}],
                            [],[],false,undefined,4,[],4,
                            [{0,<0.167.0>},
                             {1,<0.168.0>},
                             {2,<0.169.0>},
                             {3,<0.170.0>},
                             {4,<0.171.0>},
                             {5,<0.172.0>},
                             {6,<0.173.0>},
                             {7,<0.174.0>},
                             {8,<0.175.0>},
                             {9,<0.176.0>},
                             {10,<0.177.0>},
                             {11,<0.178.0>},
                             {12,<0.179.0>},
                             {13,<0.180.0>},
                             {14,<0.181.0>},
                             {15,<0.182.0>}]}
** Reason for termination ==
** {function_clause,[{cs_trans,stop,
                               [undefined],
                               [{file,"cs_trans.erl"},{line,688}]},
                     {kicker_server,terminate,2,
                                    [{file,"kicker_server.erl"},{line,528}]},
                     {gen_server,try_terminate,3,
                                 [{file,"gen_server.erl"},{line,689}]},
                     {gen_server,terminate,10,
                                 [{file,"gen_server.erl"},{line,874}]},
                     {proc_lib,init_p_do_apply,3,
                               [{file,"proc_lib.erl"},{line,249}]}]}

=CRASH REPORT==== 18-Mar-2021::12:21:32.793283 ===
  crasher:
    initial call: kicker_server:init/1
    pid: <0.1363.0>
    registered_name: kicker_server
    exception error: no function clause matching cs_trans:stop(undefined) (cs_trans.erl, line 688)
      in function  kicker_server:terminate/2 (kicker_server.erl, line 528)
      in call from gen_server:try_terminate/3 (gen_server.erl, line 689)
      in call from gen_server:terminate/10 (gen_server.erl, line 874)
    ancestors: [confd_second_sup,confd_sup,<0.48.0>]
    message_queue_len: 0
    messages: []
    links: []
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 4185
    stack_size: 27
    reductions: 13964
  neighbours:

=SUPERVISOR REPORT==== 18-Mar-2021::12:21:32.793796 ===
    supervisor: {local,confd_second_sup}
    errorContext: shutdown_error
    reason: {function_clause,
                [{cs_trans,stop,
                     [undefined],
                     [{file,"cs_trans.erl"},{line,688}]},
                 {kicker_server,terminate,2,
                     [{file,"kicker_server.erl"},{line,528}]},
                 {gen_server,try_terminate,3,
                     [{file,"gen_server.erl"},{line,689}]},
                 {gen_server,terminate,10,
                     [{file,"gen_server.erl"},{line,874}]},
                 {proc_lib,init_p_do_apply,3,
                     [{file,"proc_lib.erl"},{line,249}]}]}
    offender: [{pid,<0.1363.0>},
               {id,kicker_server},
               {mfargs,{kicker_server,start_link,[[nonode@nohost]]}},
               {restart_type,permanent},
               {shutdown,2000},
               {child_type,worker}]

Error in confd client:

62188 {"db":"running","fd":25,"func":"go_confd_validation_execute","message":"exit","service_id":"com-cm-yang-provider","session_id":64,"session_type":"system","severity":      "debug","timestamp":"2021-03-18T12:07:46.484Z","tx":-2,"tx_mode":"READ_WRITE","version":"0.2.0"}
62189 {"db":"running","fd":25,"func":"tx_write_start","message":"enter write_start","service_id":"com-cm-yang-provider","session_id":64,"session_type":"system","severity":      "debug","timestamp":"2021-03-18T12:07:46.485Z","tx":-2,"tx_mode":"READ_WRITE","version":"0.2.0"}
62190 {"db":"running","fd":25,"func":"tx_write_start","message":"exit write_start","service_id":"com-cm-yang-provider","session_id":64,"session_type":"system","severity":"      debug","timestamp":"2021-03-18T12:07:46.485Z","tx":-2,"tx_mode":"READ_WRITE","version":"0.2.0"}
62191 {"message":"enter-exit go_confd_validation_stop","service_id":"com-cm-yang-provider","severity":"debug","timestamp":"2021-03-18T12:07:46.486Z","version":"0.2.0"}
62192 DEBUG external error - application communication failure
62193 {"error":"application communication failure","message":"Failed to commit upgrade","service_id":"com-cm-yang-provider","severity":"error","timestamp":"2021-03-18T12:0      7:46.604Z","version":"0.2.0"}
62194 {"error":"failed to commit upgrade","message":"Failed to commit upgrade","service_id":"com-cm-yang-provider","severity":"error","timestamp":"2021-03-18T12:07:46.604Z      ","version":"0.2.0"}
62195 {"context":"loadYangSchemas","error":"failed to commit upgrade","message":"Failed to notify confd and/or reload models","service_id":"com-cm-yang-provider","severity      ":"error","timestamp":"2021-03-18T12:07:46.604Z","version":"0.2.0"}
62196 {"error":"failed to commit upgrade","message":"Failed to reload yang schemas","operation":"RestBackendMonitor","service_id":"com-cm-yang-provider","severity":"error"      ,"timestamp":"2021-03-18T12:07:46.604Z","version":"0.2.0"}

BRs
Michael

On one hand, confd should not crash due to reasons like case_clause; on the other hand, it was triggered by that the upgrade was aborted, and it is entirely possible this was due to a validation failure - the validation indeed takes place during maapi_commit_upgrade. But it is difficult to tell for sure, do you have more logs? E.g. the developer log?

Hi mvf,
We are going to reproduce this error when we get the Yang schemas. Then we can provide the developer log if the issue is reproduced.

You can see Generic server confd_upgrade terminating at 07:46 due to case_clause, and then 1s later Generic server kicker_server terminating due to function_clause. Is there any relation between both of servers? Do you mean confd need to be improved on the crash due to this reasons like case_clause?

BRs
Michael

I don’t think the two servers are closely related, other than they are both involved in the upgrade process. To me, the function_clause problem appears to be unrelated and seems to have been fixed in ConfD releases 7.5.1 and 7.4.2; I am not sure about the case_clause. And yes, confd servers should not crash in not-so-rare scenarios, so if you have a reproduction procedure, it should be reported, ideally with a self-contained example.

Hi,

Do you know what’s the meaning of the below error when commit upgrade?

DEBUG external error - application communication failure

Thanks!

Hi,

From the confd_lib_lib(3) man page:

CONFD_ERR_EXTERNAL (19)
All errors that originate in user code.

Often due to a data provider (DP API) application that returns CONFD_ERR from one of the callbacks it serves when ConfD call them or the data provider application crash or has crashed.

Regards