Compare commits

...

10 Commits

Author SHA1 Message Date
c6dbce8f90 Update README.md
Clarify that this agent is meant to run with Net SNMPd.
2023-08-26 11:58:36 +02:00
aa38c5503f Update README.md
Add a few hints based on previous issues filed in this repo.
Clarify that linux-cp must be used, the API/Stats socket must be accessible, and that no backwards compatibility is given.
2023-08-26 11:55:22 +02:00
684400ff9e Reduce logging on AgentX connections
Previous logging was very noisy when the agent connection to snmpd
drops:

[ERROR   ] agentx.network - run            : Empty PDU, connection closed!
[INFO    ] agentx.network - disconnect     : Disconnecting from localhost:705
[ERROR   ] agentx.agent - run            : An exception occurred: Empty PDU, disconnecting
[ERROR   ] agentx.agent - run            : Reconnecting
[INFO    ] agentx.agent - run            : Opening AgentX connection
[INFO    ] agentx.network - connect        : Connecting to localhost:705
[ERROR   ] agentx.network - connect        : Failed to connect to localhost:705
[ERROR   ] agentx.agent - run            : An exception occurred: Not connected
[ERROR   ] agentx.agent - run            : Reconnecting
[INFO    ] agentx.agent - run            : Opening AgentX connection
[INFO    ] agentx.network - connect        : Connecting to localhost:705
[ERROR   ] agentx.network - connect        : Failed to connect to localhost:705
[ERROR   ] agentx.agent - run            : An exception occurred: Not connected
[ERROR   ] agentx.agent - run            : Reconnecting

Also, reconnects were attempted every 0.1s, but field research shows
that snmpd, if it restarts, takes ~3-5 seconds to come back (note: this
is also due to a systemd delay in restarting it upon failures).
Hammering the connection is not useful.

This change refactors the logging, to avoid redundant messages:
- sleep 1s between attempts (reducing the loop by 10x)
- Either print 'Connected to' or 'Failed to connect to', not both.
- Remove the 'reconnecting' superfluous message
2023-01-14 11:12:06 +00:00
43551958f8 Typo fix 2023-01-10 17:13:27 +01:00
31529a2815 improvement: add flag for agentx debugging
agentx/network.py always turned on debugging. It can be useful to have
debugging logs of the main application without the agentx debug logs, as
they are quite noisy.

Now, ./vpp-snmp-agent.py -d will turn on application debugging but NOT
agentx debugging. ./vpp-snmp-agent.py -d -dd will turn on both.

NOTE: ./vpp-snmp-agent.py -dd will do nothing, because the '-d' flag
determines the global logging level.
2023-01-10 15:21:32 +01:00
0d7dea37f5 Merge branch 'main' of github.com:pimvanpelt/vpp-snmp-agent 2023-01-10 11:26:21 +01:00
95d96d5e61 bugfix: add a control_ping() before each update
If VPP were to disconnect either the Stats Segment or the API endpoint,
for example if it crashes and restarts, vpp-snmp-agent will not detect
this. In such a situation, it will hold on to the stale stats and no
longer receive interface updates.

Before each run, send a control_ping() API request, and if that were to
fail (for example with Broken Pipe, or Connection Refused), disconnect
both API and Stats (in the vpp.disconnect() call, also invalidate the interface
and LCP cache), and then fail the update. The Agent runner will then retry
once per second until the connection (and control_ping()) succeeds.

TESTED:
- Start vpp-snmp-agent, it connects and starts up per normal.
- Exit / Kill vpp
- Upon the next update(), the control_ping() call will fail, causing the
  agent to disconnect
- The agent will now loop:
[ERROR   ]      agentx.agent - update         : VPP API: [Errno 1] Sendall error: BrokenPipeError(32, 'Broken pipe'), retrying
[WARNING ]      agentx.agent - run            : Update failed, last successful update was 1673345631.7658572
[INFO    ]     agentx.vppapi - connect        : Connecting to VPP
[ERROR   ]      agentx.agent - update         : VPP API: Not connected, api definitions not available, retrying

- Start VPP again, when its API endpoint is ready:
[INFO    ]     agentx.vppapi - connect        : Connecting to VPP
[INFO    ]     agentx.vppapi - connect        : VPP version is 23.02-rc0~199-gcfaf44020
[INFO    ]     agentx.vppapi - connect        : Enabling VPP API interface events
[DEBUG   ]      agentx.agent - update         : VPP API: control_ping_reply(_0=24, context=12, retval=0, client_index=0, vpe_pid=705326)
[INFO    ]     agentx.vppapi - get_ifaces     : Requesting interfaces from VPP API
[INFO    ]     agentx.vppapi - get_lcp        : Requesting LCPs from VPP API

- The agent resumes where it left off
2023-01-10 11:24:44 +01:00
b6864530eb Update README.md 2023-01-08 23:11:55 +01:00
7f4427c4b6 Improvement: Use interface/LCP caching on VPP API
- Set an initial vppapi.iface_dict and lcp_dict to None.
- Set an event watcher API call, with a callback
- When events happen, flush the iface/lcp cache (by setting them to None).
- When get_ifaces / get_lcp sees an empty cache, fetch the data from VPP
  API and put into the cache for subsequent calls.

This way, the VPP API is only used upon startup (when the caches are
empty), and on interface add/del/changes (note: the events fire for
link, and admin up/down, but not for MTU changes).

One small race condition exists: if a new LCP is created, this does not
trigger an interface event. Adding a want_lcp_events() makes sense, but
until then, a few options remain:
0) race exists only if inerface was created; THEN the cache was
   refreshed; and THEN the LCP was created.
1) create the lcp and then force a change to any interface (this will
   create an sw_interface event and flush the cache)
2) restart vpp-snmp-agent
2023-01-08 13:57:08 +01:00
c81a035091 Refactor to use VPPApiJSONFiles 2023-01-08 13:24:54 +01:00
5 changed files with 104 additions and 43 deletions

View File

@ -43,7 +43,10 @@ optional arguments:
sudo cp dist/vpp-snmp-agent /usr/sbin/
```
## Configuration file
## Configuration
This agent requires the `linux-cp` plugin to be enabled in VPP, and it requires read/write access
to the VPP API and Stats sockets (typically in `/run/vpp/*.sock`).
This SNMP Agent will read a [vppcfg](https://github.com/pimvanpelt/vppcfg) configuration file,
which provides a mapping between VPP interface names, Linux Control Plane interface names, and
@ -85,8 +88,12 @@ the `ifName`. However, if the config file is read, it will change the behavior a
## SNMPd config
First, configure the snmpd to accept agentx connections by adding (at least) the following
to `snmpd.conf`:
This agent is meant to run alongside the snmpd shipped in Debian (Bullseye or Bookworm), called
[Net SNMP](http://net-snmp.sourceforge.net/). The same snmpd is available in Ubuntu (Focal, Jammy) as well,
which should work.
After installing the snmpd (`apt install snmpd`), configure it to accept agentx connections by adding (at least)
the following to `snmpd.conf`:
```
master agentx
agentXSocket tcp:localhost:705,unix:/var/agentx-dataplane/master
@ -119,15 +126,18 @@ sudo systemctl start vpp-snmp-agent
# Support
Limnited support is offered on this codebase. GitHub issues can be files for issues with the design or
implementation (eg bugs, feature requests), but no _user_ support can be given. Put simply, this repo
accepts only bugreports with the code, not with its use.
This software is compatible only with the current production release of VPP, which can be found on its
[Gerrit](https://gerrit.fd.io/r/q/repo:vpp) service. Maintaining backwards compatibility is not a goal of
this repository.
Limited support is offered on the codebase: GitHub issues may be filed for issues with the _design or
implementation_ (eg. bugs, feature requests), but _user_ support can not be given. Put simply, this repo
accepts only bugreports with the code, not with its use. See the LICENSE for clarity.
Issues with the codebase that are well researched ([this article](https://marker.io/blog/how-to-write-bug-report)
gives a good example of the expectation), preferably pointing at the location where the problem occurred,
and if possible proposing a fix, are most welcome.
Requests that are not discussing problems with the software itself, notably enduser support requests,
will not be handled unless they clearly demonstrate a bug and propose workarounds or fixes.
Paid support can be obtained on hourly commission. Reach out \to IPng Networks (sales@ipng.ch) to discuss rates.
Requests that don't discuss problems with the software itself, notably enduser support requests, will not be
handled unless they clearly demonstrate a bug and propose workarounds or fixes. Paid support can be obtained
on hourly commission. Reach out to IPng Networks GmbH (sales@ipng.ch) to discuss rates.

View File

@ -28,7 +28,11 @@ class Agent(object):
self._lastupdate = 0
self._update_period = period # Seconds
self._net = Network(server_address=server_address)
try:
debug = args.debug_agent
except:
debug = False
self._net = Network(server_address=server_address, debug=debug)
self._oid_list = []
self._args = args
@ -67,10 +71,9 @@ class Agent(object):
try:
self._net.run()
except Exception as e:
self.logger.error("An exception occurred: %s" % e)
self.logger.error("Reconnecting")
self.logger.error("Disconnecting due to exception: %s" % e)
self._net.disconnect()
time.sleep(0.1)
time.sleep(1)
def stop(self):
self.logger.debug("Stopping")

View File

@ -27,11 +27,11 @@ class NetworkError(Exception):
class Network:
def __init__(self, server_address="/var/agentx/master"):
def __init__(self, server_address="/var/agentx/master", debug=False):
self.session_id = 0
self.transaction_id = 0
self.debug = 1
self.debug = debug
# Data Related Variables
self.data = {}
self.data_idx = []
@ -44,7 +44,6 @@ class Network:
return
try:
logger.info("Connecting to %s" % self._server_address)
if self._server_address.startswith("/"):
self.socket = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
self.socket.connect(self._server_address)
@ -55,9 +54,10 @@ class Network:
self.socket.connect((host, int(port)))
self.socket.settimeout(self._timeout)
self._connected = True
logger.info("Connected to %s" % self._server_address)
except socket.error:
logger.error("Failed to connect to %s" % self._server_address)
self._connected = False
logger.error("Failed to connect to %s" % self._server_address)
def disconnect(self):
if not self._connected:

View File

@ -96,9 +96,14 @@ class MyAgent(agentx.Agent):
def update(self):
try:
self.vpp.connect()
r = self.vpp.vpp.api.control_ping()
self.logger.debug(f"VPP API: {r}")
self.vppstat.connect()
except:
self.logger.error("Could not connect to VPPStats segment")
except Exception as e:
self.logger.error(f"VPP API: {e}, retrying")
self.vppstat.disconnect()
self.vpp.disconnect()
return False
ds = agentx.DataSet()
@ -108,7 +113,6 @@ class MyAgent(agentx.Agent):
num_ifaces = len(ifaces)
num_vppstat = len(self.vppstat["/if/names"])
num_lcp = len(lcp)
self.logger.debug("LCP: %s" % (lcp))
self.logger.debug(
"Retrieved Interfaces: vppapi=%d vppstat=%d lcp=%d"
% (num_ifaces, num_vppstat, num_lcp)
@ -384,10 +388,16 @@ def main():
parser.add_argument(
"-d", dest="debug", action="store_true", help="""Enable debug, default False"""
)
parser.add_argument(
"-dd",
dest="debug_agent",
action="store_true",
help="""Enable agentx debug, default False""",
)
args = parser.parse_args()
if args.debug:
print("Arguments:", args)
print(f"Arguments: {args}")
agentx.setup_logging(debug=args.debug)

View File

@ -3,7 +3,7 @@ The functions in this file interact with the VPP API to retrieve certain
interface metadata.
"""
from vpp_papi import VPPApiClient
from vpp_papi import VPPApiClient, VPPApiJSONFiles
import os
import fnmatch
import logging
@ -25,24 +25,36 @@ class VPPApi:
self.connected = False
self.clientname = clientname
self.vpp = None
self.iface_dict = None
self.lcp_dict = None
def _sw_interface_event(self, event):
# NOTE(pim): this callback runs in a background thread, so we just clear the
# cached interfaces and LCPs here, subsequent call to get_ifaces() or get_lcp()
# will refresh them in the main thread.
logger.info(f"Clearing iface and LCP cache due to interface event")
self.iface_dict = None
self.lcp_dict = None
def _event_callback(self, msg_type_name, msg_type):
logger.debug(f"Received callback: {msg_type_name} => {msg_type}")
if msg_type_name == "sw_interface_event":
self._sw_interface_event(msg_type)
else:
logger.warning(f"Ignoring unkonwn event: {msg_type_name} => {msg_type}")
def connect(self):
if self.connected:
return True
vpp_json_dir = "/usr/share/vpp/api/"
# construct a list of all the json api files
jsonfiles = []
for root, dirnames, filenames in os.walk(vpp_json_dir):
for filename in fnmatch.filter(filenames, "*.api.json"):
jsonfiles.append(os.path.join(root, filename))
if not jsonfiles:
vpp_json_dir = VPPApiJSONFiles.find_api_dir([])
vpp_jsonfiles = VPPApiJSONFiles.find_api_files(api_dir=vpp_json_dir)
if not vpp_jsonfiles:
logger.error("no json api files found")
return False
self.vpp = VPPApiClient(apifiles=jsonfiles, server_address=self.address)
self.vpp = VPPApiClient(apifiles=vpp_jsonfiles, server_address=self.address)
self.vpp.register_event_callback(self._event_callback)
try:
logger.info("Connecting to VPP")
self.vpp.connect(self.clientname)
@ -52,6 +64,13 @@ class VPPApi:
v = self.vpp.api.show_version()
logger.info("VPP version is %s" % v.version)
logger.info("Enabling VPP API interface events")
r = self.vpp.api.want_interface_events(enable_disable=True)
if r.retval != 0:
logger.error("Could not enable VPP API interface events, disconnecting")
self.disconnect()
return False
self.connected = True
return True
@ -59,42 +78,58 @@ class VPPApi:
if not self.connected:
return True
self.vpp.disconnect()
self.iface_dict = None
self.lcp_dict = None
self.connected = False
return True
def get_ifaces(self):
ret = {}
if not self.connected:
if not self.connected and not self.connect():
logger.warning("Can't connect to VPP API")
return ret
if type(self.iface_dict) is dict:
logger.debug("Returning cached interfaces")
return self.iface_dict
ret = {}
try:
logger.info("Requesting interfaces from VPP API")
iface_list = self.vpp.api.sw_interface_dump()
except Exception as e:
logger.error("VPP communication error, disconnecting", e)
self.vpp.disconnect()
self.connected = False
logger.error("VPP API communication error, disconnecting", e)
self.disconnect()
return ret
if not iface_list:
logger.error("Can't get interface list")
logger.error("Can't get interface list, disconnecting")
self.disconnect()
return ret
for iface in iface_list:
ret[iface.interface_name] = iface
return ret
self.iface_dict = ret
logger.debug(f"Caching interfaces: {ret}")
return self.iface_dict
def get_lcp(self):
ret = {}
if not self.connected:
if not self.connected and not self.connect():
logger.warning("Can't connect to VPP API")
return ret
if type(self.lcp_dict) is dict:
logger.debug("Returning cached LCPs")
return self.lcp_dict
try:
logger.info("Requesting LCPs from VPP API")
lcp_list = self.vpp.api.lcp_itf_pair_get()
except Exception as e:
logger.error("VPP communication error, disconnecting", e)
self.vpp.disconnect()
self.connected = False
self.disconnect()
return ret
if not lcp_list:
@ -103,4 +138,7 @@ class VPPApi:
for lcp in lcp_list[1]:
ret[lcp.host_if_name] = lcp
return ret
self.lcp_dict = ret
logger.debug(f"Caching LCPs: {ret}")
return self.lcp_dict