bugfix: add a control_ping() before each update
If VPP were to disconnect either the Stats Segment or the API endpoint, for example if it crashes and restarts, vpp-snmp-agent will not detect this. In such a situation, it will hold on to the stale stats and no longer receive interface updates. Before each run, send a control_ping() API request, and if that were to fail (for example with Broken Pipe, or Connection Refused), disconnect both API and Stats (in the vpp.disconnect() call, also invalidate the interface and LCP cache), and then fail the update. The Agent runner will then retry once per second until the connection (and control_ping()) succeeds. TESTED: - Start vpp-snmp-agent, it connects and starts up per normal. - Exit / Kill vpp - Upon the next update(), the control_ping() call will fail, causing the agent to disconnect - The agent will now loop: [ERROR ] agentx.agent - update : VPP API: [Errno 1] Sendall error: BrokenPipeError(32, 'Broken pipe'), retrying [WARNING ] agentx.agent - run : Update failed, last successful update was 1673345631.7658572 [INFO ] agentx.vppapi - connect : Connecting to VPP [ERROR ] agentx.agent - update : VPP API: Not connected, api definitions not available, retrying - Start VPP again, when its API endpoint is ready: [INFO ] agentx.vppapi - connect : Connecting to VPP [INFO ] agentx.vppapi - connect : VPP version is 23.02-rc0~199-gcfaf44020 [INFO ] agentx.vppapi - connect : Enabling VPP API interface events [DEBUG ] agentx.agent - update : VPP API: control_ping_reply(_0=24, context=12, retval=0, client_index=0, vpe_pid=705326) [INFO ] agentx.vppapi - get_ifaces : Requesting interfaces from VPP API [INFO ] agentx.vppapi - get_lcp : Requesting LCPs from VPP API - The agent resumes where it left off
This commit is contained in: