Compare commits

..

10 Commits

Author SHA1 Message Date
c6dbce8f90 Update README.md
Clarify that this agent is meant to run with Net SNMPd.
2023-08-26 11:58:36 +02:00
aa38c5503f Update README.md
Add a few hints based on previous issues filed in this repo.
Clarify that linux-cp must be used, the API/Stats socket must be accessible, and that no backwards compatibility is given.
2023-08-26 11:55:22 +02:00
684400ff9e Reduce logging on AgentX connections
Previous logging was very noisy when the agent connection to snmpd
drops:

[ERROR   ] agentx.network - run            : Empty PDU, connection closed!
[INFO    ] agentx.network - disconnect     : Disconnecting from localhost:705
[ERROR   ] agentx.agent - run            : An exception occurred: Empty PDU, disconnecting
[ERROR   ] agentx.agent - run            : Reconnecting
[INFO    ] agentx.agent - run            : Opening AgentX connection
[INFO    ] agentx.network - connect        : Connecting to localhost:705
[ERROR   ] agentx.network - connect        : Failed to connect to localhost:705
[ERROR   ] agentx.agent - run            : An exception occurred: Not connected
[ERROR   ] agentx.agent - run            : Reconnecting
[INFO    ] agentx.agent - run            : Opening AgentX connection
[INFO    ] agentx.network - connect        : Connecting to localhost:705
[ERROR   ] agentx.network - connect        : Failed to connect to localhost:705
[ERROR   ] agentx.agent - run            : An exception occurred: Not connected
[ERROR   ] agentx.agent - run            : Reconnecting

Also, reconnects were attempted every 0.1s, but field research shows
that snmpd, if it restarts, takes ~3-5 seconds to come back (note: this
is also due to a systemd delay in restarting it upon failures).
Hammering the connection is not useful.

This change refactors the logging, to avoid redundant messages:
- sleep 1s between attempts (reducing the loop by 10x)
- Either print 'Connected to' or 'Failed to connect to', not both.
- Remove the 'reconnecting' superfluous message
2023-01-14 11:12:06 +00:00
43551958f8 Typo fix 2023-01-10 17:13:27 +01:00
31529a2815 improvement: add flag for agentx debugging
agentx/network.py always turned on debugging. It can be useful to have
debugging logs of the main application without the agentx debug logs, as
they are quite noisy.

Now, ./vpp-snmp-agent.py -d will turn on application debugging but NOT
agentx debugging. ./vpp-snmp-agent.py -d -dd will turn on both.

NOTE: ./vpp-snmp-agent.py -dd will do nothing, because the '-d' flag
determines the global logging level.
2023-01-10 15:21:32 +01:00
0d7dea37f5 Merge branch 'main' of github.com:pimvanpelt/vpp-snmp-agent 2023-01-10 11:26:21 +01:00
95d96d5e61 bugfix: add a control_ping() before each update
If VPP were to disconnect either the Stats Segment or the API endpoint,
for example if it crashes and restarts, vpp-snmp-agent will not detect
this. In such a situation, it will hold on to the stale stats and no
longer receive interface updates.

Before each run, send a control_ping() API request, and if that were to
fail (for example with Broken Pipe, or Connection Refused), disconnect
both API and Stats (in the vpp.disconnect() call, also invalidate the interface
and LCP cache), and then fail the update. The Agent runner will then retry
once per second until the connection (and control_ping()) succeeds.

TESTED:
- Start vpp-snmp-agent, it connects and starts up per normal.
- Exit / Kill vpp
- Upon the next update(), the control_ping() call will fail, causing the
  agent to disconnect
- The agent will now loop:
[ERROR   ]      agentx.agent - update         : VPP API: [Errno 1] Sendall error: BrokenPipeError(32, 'Broken pipe'), retrying
[WARNING ]      agentx.agent - run            : Update failed, last successful update was 1673345631.7658572
[INFO    ]     agentx.vppapi - connect        : Connecting to VPP
[ERROR   ]      agentx.agent - update         : VPP API: Not connected, api definitions not available, retrying

- Start VPP again, when its API endpoint is ready:
[INFO    ]     agentx.vppapi - connect        : Connecting to VPP
[INFO    ]     agentx.vppapi - connect        : VPP version is 23.02-rc0~199-gcfaf44020
[INFO    ]     agentx.vppapi - connect        : Enabling VPP API interface events
[DEBUG   ]      agentx.agent - update         : VPP API: control_ping_reply(_0=24, context=12, retval=0, client_index=0, vpe_pid=705326)
[INFO    ]     agentx.vppapi - get_ifaces     : Requesting interfaces from VPP API
[INFO    ]     agentx.vppapi - get_lcp        : Requesting LCPs from VPP API

- The agent resumes where it left off
2023-01-10 11:24:44 +01:00
b6864530eb Update README.md 2023-01-08 23:11:55 +01:00
7f4427c4b6 Improvement: Use interface/LCP caching on VPP API
- Set an initial vppapi.iface_dict and lcp_dict to None.
- Set an event watcher API call, with a callback
- When events happen, flush the iface/lcp cache (by setting them to None).
- When get_ifaces / get_lcp sees an empty cache, fetch the data from VPP
  API and put into the cache for subsequent calls.

This way, the VPP API is only used upon startup (when the caches are
empty), and on interface add/del/changes (note: the events fire for
link, and admin up/down, but not for MTU changes).

One small race condition exists: if a new LCP is created, this does not
trigger an interface event. Adding a want_lcp_events() makes sense, but
until then, a few options remain:
0) race exists only if inerface was created; THEN the cache was
   refreshed; and THEN the LCP was created.
1) create the lcp and then force a change to any interface (this will
   create an sw_interface event and flush the cache)
2) restart vpp-snmp-agent
2023-01-08 13:57:08 +01:00
c81a035091 Refactor to use VPPApiJSONFiles 2023-01-08 13:24:54 +01:00
5 changed files with 104 additions and 43 deletions

View File

@ -43,7 +43,10 @@ optional arguments:
sudo cp dist/vpp-snmp-agent /usr/sbin/ sudo cp dist/vpp-snmp-agent /usr/sbin/
``` ```
## Configuration file ## Configuration
This agent requires the `linux-cp` plugin to be enabled in VPP, and it requires read/write access
to the VPP API and Stats sockets (typically in `/run/vpp/*.sock`).
This SNMP Agent will read a [vppcfg](https://github.com/pimvanpelt/vppcfg) configuration file, This SNMP Agent will read a [vppcfg](https://github.com/pimvanpelt/vppcfg) configuration file,
which provides a mapping between VPP interface names, Linux Control Plane interface names, and which provides a mapping between VPP interface names, Linux Control Plane interface names, and
@ -85,8 +88,12 @@ the `ifName`. However, if the config file is read, it will change the behavior a
## SNMPd config ## SNMPd config
First, configure the snmpd to accept agentx connections by adding (at least) the following This agent is meant to run alongside the snmpd shipped in Debian (Bullseye or Bookworm), called
to `snmpd.conf`: [Net SNMP](http://net-snmp.sourceforge.net/). The same snmpd is available in Ubuntu (Focal, Jammy) as well,
which should work.
After installing the snmpd (`apt install snmpd`), configure it to accept agentx connections by adding (at least)
the following to `snmpd.conf`:
``` ```
master agentx master agentx
agentXSocket tcp:localhost:705,unix:/var/agentx-dataplane/master agentXSocket tcp:localhost:705,unix:/var/agentx-dataplane/master
@ -119,15 +126,18 @@ sudo systemctl start vpp-snmp-agent
# Support # Support
Limnited support is offered on this codebase. GitHub issues can be files for issues with the design or This software is compatible only with the current production release of VPP, which can be found on its
implementation (eg bugs, feature requests), but no _user_ support can be given. Put simply, this repo [Gerrit](https://gerrit.fd.io/r/q/repo:vpp) service. Maintaining backwards compatibility is not a goal of
accepts only bugreports with the code, not with its use. this repository.
Limited support is offered on the codebase: GitHub issues may be filed for issues with the _design or
implementation_ (eg. bugs, feature requests), but _user_ support can not be given. Put simply, this repo
accepts only bugreports with the code, not with its use. See the LICENSE for clarity.
Issues with the codebase that are well researched ([this article](https://marker.io/blog/how-to-write-bug-report) Issues with the codebase that are well researched ([this article](https://marker.io/blog/how-to-write-bug-report)
gives a good example of the expectation), preferably pointing at the location where the problem occurred, gives a good example of the expectation), preferably pointing at the location where the problem occurred,
and if possible proposing a fix, are most welcome. and if possible proposing a fix, are most welcome.
Requests that are not discussing problems with the software itself, notably enduser support requests, Requests that don't discuss problems with the software itself, notably enduser support requests, will not be
will not be handled unless they clearly demonstrate a bug and propose workarounds or fixes. handled unless they clearly demonstrate a bug and propose workarounds or fixes. Paid support can be obtained
on hourly commission. Reach out to IPng Networks GmbH (sales@ipng.ch) to discuss rates.
Paid support can be obtained on hourly commission. Reach out \to IPng Networks (sales@ipng.ch) to discuss rates.

View File

@ -28,7 +28,11 @@ class Agent(object):
self._lastupdate = 0 self._lastupdate = 0
self._update_period = period # Seconds self._update_period = period # Seconds
self._net = Network(server_address=server_address) try:
debug = args.debug_agent
except:
debug = False
self._net = Network(server_address=server_address, debug=debug)
self._oid_list = [] self._oid_list = []
self._args = args self._args = args
@ -67,10 +71,9 @@ class Agent(object):
try: try:
self._net.run() self._net.run()
except Exception as e: except Exception as e:
self.logger.error("An exception occurred: %s" % e) self.logger.error("Disconnecting due to exception: %s" % e)
self.logger.error("Reconnecting")
self._net.disconnect() self._net.disconnect()
time.sleep(0.1) time.sleep(1)
def stop(self): def stop(self):
self.logger.debug("Stopping") self.logger.debug("Stopping")

View File

@ -27,11 +27,11 @@ class NetworkError(Exception):
class Network: class Network:
def __init__(self, server_address="/var/agentx/master"): def __init__(self, server_address="/var/agentx/master", debug=False):
self.session_id = 0 self.session_id = 0
self.transaction_id = 0 self.transaction_id = 0
self.debug = 1 self.debug = debug
# Data Related Variables # Data Related Variables
self.data = {} self.data = {}
self.data_idx = [] self.data_idx = []
@ -44,7 +44,6 @@ class Network:
return return
try: try:
logger.info("Connecting to %s" % self._server_address)
if self._server_address.startswith("/"): if self._server_address.startswith("/"):
self.socket = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) self.socket = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
self.socket.connect(self._server_address) self.socket.connect(self._server_address)
@ -55,9 +54,10 @@ class Network:
self.socket.connect((host, int(port))) self.socket.connect((host, int(port)))
self.socket.settimeout(self._timeout) self.socket.settimeout(self._timeout)
self._connected = True self._connected = True
logger.info("Connected to %s" % self._server_address)
except socket.error: except socket.error:
logger.error("Failed to connect to %s" % self._server_address)
self._connected = False self._connected = False
logger.error("Failed to connect to %s" % self._server_address)
def disconnect(self): def disconnect(self):
if not self._connected: if not self._connected:

View File

@ -96,9 +96,14 @@ class MyAgent(agentx.Agent):
def update(self): def update(self):
try: try:
self.vpp.connect()
r = self.vpp.vpp.api.control_ping()
self.logger.debug(f"VPP API: {r}")
self.vppstat.connect() self.vppstat.connect()
except: except Exception as e:
self.logger.error("Could not connect to VPPStats segment") self.logger.error(f"VPP API: {e}, retrying")
self.vppstat.disconnect()
self.vpp.disconnect()
return False return False
ds = agentx.DataSet() ds = agentx.DataSet()
@ -108,7 +113,6 @@ class MyAgent(agentx.Agent):
num_ifaces = len(ifaces) num_ifaces = len(ifaces)
num_vppstat = len(self.vppstat["/if/names"]) num_vppstat = len(self.vppstat["/if/names"])
num_lcp = len(lcp) num_lcp = len(lcp)
self.logger.debug("LCP: %s" % (lcp))
self.logger.debug( self.logger.debug(
"Retrieved Interfaces: vppapi=%d vppstat=%d lcp=%d" "Retrieved Interfaces: vppapi=%d vppstat=%d lcp=%d"
% (num_ifaces, num_vppstat, num_lcp) % (num_ifaces, num_vppstat, num_lcp)
@ -384,10 +388,16 @@ def main():
parser.add_argument( parser.add_argument(
"-d", dest="debug", action="store_true", help="""Enable debug, default False""" "-d", dest="debug", action="store_true", help="""Enable debug, default False"""
) )
parser.add_argument(
"-dd",
dest="debug_agent",
action="store_true",
help="""Enable agentx debug, default False""",
)
args = parser.parse_args() args = parser.parse_args()
if args.debug: if args.debug:
print("Arguments:", args) print(f"Arguments: {args}")
agentx.setup_logging(debug=args.debug) agentx.setup_logging(debug=args.debug)

View File

@ -3,7 +3,7 @@ The functions in this file interact with the VPP API to retrieve certain
interface metadata. interface metadata.
""" """
from vpp_papi import VPPApiClient from vpp_papi import VPPApiClient, VPPApiJSONFiles
import os import os
import fnmatch import fnmatch
import logging import logging
@ -25,24 +25,36 @@ class VPPApi:
self.connected = False self.connected = False
self.clientname = clientname self.clientname = clientname
self.vpp = None self.vpp = None
self.iface_dict = None
self.lcp_dict = None
def _sw_interface_event(self, event):
# NOTE(pim): this callback runs in a background thread, so we just clear the
# cached interfaces and LCPs here, subsequent call to get_ifaces() or get_lcp()
# will refresh them in the main thread.
logger.info(f"Clearing iface and LCP cache due to interface event")
self.iface_dict = None
self.lcp_dict = None
def _event_callback(self, msg_type_name, msg_type):
logger.debug(f"Received callback: {msg_type_name} => {msg_type}")
if msg_type_name == "sw_interface_event":
self._sw_interface_event(msg_type)
else:
logger.warning(f"Ignoring unkonwn event: {msg_type_name} => {msg_type}")
def connect(self): def connect(self):
if self.connected: if self.connected:
return True return True
vpp_json_dir = "/usr/share/vpp/api/" vpp_json_dir = VPPApiJSONFiles.find_api_dir([])
vpp_jsonfiles = VPPApiJSONFiles.find_api_files(api_dir=vpp_json_dir)
# construct a list of all the json api files if not vpp_jsonfiles:
jsonfiles = []
for root, dirnames, filenames in os.walk(vpp_json_dir):
for filename in fnmatch.filter(filenames, "*.api.json"):
jsonfiles.append(os.path.join(root, filename))
if not jsonfiles:
logger.error("no json api files found") logger.error("no json api files found")
return False return False
self.vpp = VPPApiClient(apifiles=jsonfiles, server_address=self.address) self.vpp = VPPApiClient(apifiles=vpp_jsonfiles, server_address=self.address)
self.vpp.register_event_callback(self._event_callback)
try: try:
logger.info("Connecting to VPP") logger.info("Connecting to VPP")
self.vpp.connect(self.clientname) self.vpp.connect(self.clientname)
@ -52,6 +64,13 @@ class VPPApi:
v = self.vpp.api.show_version() v = self.vpp.api.show_version()
logger.info("VPP version is %s" % v.version) logger.info("VPP version is %s" % v.version)
logger.info("Enabling VPP API interface events")
r = self.vpp.api.want_interface_events(enable_disable=True)
if r.retval != 0:
logger.error("Could not enable VPP API interface events, disconnecting")
self.disconnect()
return False
self.connected = True self.connected = True
return True return True
@ -59,42 +78,58 @@ class VPPApi:
if not self.connected: if not self.connected:
return True return True
self.vpp.disconnect() self.vpp.disconnect()
self.iface_dict = None
self.lcp_dict = None
self.connected = False self.connected = False
return True return True
def get_ifaces(self): def get_ifaces(self):
ret = {} ret = {}
if not self.connected: if not self.connected and not self.connect():
logger.warning("Can't connect to VPP API")
return ret return ret
if type(self.iface_dict) is dict:
logger.debug("Returning cached interfaces")
return self.iface_dict
ret = {}
try: try:
logger.info("Requesting interfaces from VPP API")
iface_list = self.vpp.api.sw_interface_dump() iface_list = self.vpp.api.sw_interface_dump()
except Exception as e: except Exception as e:
logger.error("VPP communication error, disconnecting", e) logger.error("VPP API communication error, disconnecting", e)
self.vpp.disconnect() self.disconnect()
self.connected = False
return ret return ret
if not iface_list: if not iface_list:
logger.error("Can't get interface list") logger.error("Can't get interface list, disconnecting")
self.disconnect()
return ret return ret
for iface in iface_list: for iface in iface_list:
ret[iface.interface_name] = iface ret[iface.interface_name] = iface
return ret self.iface_dict = ret
logger.debug(f"Caching interfaces: {ret}")
return self.iface_dict
def get_lcp(self): def get_lcp(self):
ret = {} ret = {}
if not self.connected: if not self.connected and not self.connect():
logger.warning("Can't connect to VPP API")
return ret return ret
if type(self.lcp_dict) is dict:
logger.debug("Returning cached LCPs")
return self.lcp_dict
try: try:
logger.info("Requesting LCPs from VPP API")
lcp_list = self.vpp.api.lcp_itf_pair_get() lcp_list = self.vpp.api.lcp_itf_pair_get()
except Exception as e: except Exception as e:
logger.error("VPP communication error, disconnecting", e) logger.error("VPP communication error, disconnecting", e)
self.vpp.disconnect() self.disconnect()
self.connected = False
return ret return ret
if not lcp_list: if not lcp_list:
@ -103,4 +138,7 @@ class VPPApi:
for lcp in lcp_list[1]: for lcp in lcp_list[1]:
ret[lcp.host_if_name] = lcp ret[lcp.host_if_name] = lcp
return ret
self.lcp_dict = ret
logger.debug(f"Caching LCPs: {ret}")
return self.lcp_dict