vpp-snmp-agent

Author	SHA1	Message	Date
Pim van Pelt	c6dbce8f90	Update README.md Clarify that this agent is meant to run with Net SNMPd.	2023-08-26 11:58:36 +02:00
Pim van Pelt	aa38c5503f	Update README.md Add a few hints based on previous issues filed in this repo. Clarify that linux-cp must be used, the API/Stats socket must be accessible, and that no backwards compatibility is given.	2023-08-26 11:55:22 +02:00
Pim van Pelt	684400ff9e	Reduce logging on AgentX connections Previous logging was very noisy when the agent connection to snmpd drops: [ERROR ] agentx.network - run : Empty PDU, connection closed! [INFO ] agentx.network - disconnect : Disconnecting from localhost:705 [ERROR ] agentx.agent - run : An exception occurred: Empty PDU, disconnecting [ERROR ] agentx.agent - run : Reconnecting [INFO ] agentx.agent - run : Opening AgentX connection [INFO ] agentx.network - connect : Connecting to localhost:705 [ERROR ] agentx.network - connect : Failed to connect to localhost:705 [ERROR ] agentx.agent - run : An exception occurred: Not connected [ERROR ] agentx.agent - run : Reconnecting [INFO ] agentx.agent - run : Opening AgentX connection [INFO ] agentx.network - connect : Connecting to localhost:705 [ERROR ] agentx.network - connect : Failed to connect to localhost:705 [ERROR ] agentx.agent - run : An exception occurred: Not connected [ERROR ] agentx.agent - run : Reconnecting Also, reconnects were attempted every 0.1s, but field research shows that snmpd, if it restarts, takes ~3-5 seconds to come back (note: this is also due to a systemd delay in restarting it upon failures). Hammering the connection is not useful. This change refactors the logging, to avoid redundant messages: - sleep 1s between attempts (reducing the loop by 10x) - Either print 'Connected to' or 'Failed to connect to', not both. - Remove the 'reconnecting' superfluous message	2023-01-14 11:12:06 +00:00
Pim van Pelt	43551958f8	Typo fix	2023-01-10 17:13:27 +01:00
Pim van Pelt	31529a2815	improvement: add flag for agentx debugging agentx/network.py always turned on debugging. It can be useful to have debugging logs of the main application without the agentx debug logs, as they are quite noisy. Now, ./vpp-snmp-agent.py -d will turn on application debugging but NOT agentx debugging. ./vpp-snmp-agent.py -d -dd will turn on both. NOTE: ./vpp-snmp-agent.py -dd will do nothing, because the '-d' flag determines the global logging level.	2023-01-10 15:21:32 +01:00
Pim van Pelt	0d7dea37f5	Merge branch 'main' of github.com:pimvanpelt/vpp-snmp-agent	2023-01-10 11:26:21 +01:00
Pim van Pelt	95d96d5e61	bugfix: add a control_ping() before each update If VPP were to disconnect either the Stats Segment or the API endpoint, for example if it crashes and restarts, vpp-snmp-agent will not detect this. In such a situation, it will hold on to the stale stats and no longer receive interface updates. Before each run, send a control_ping() API request, and if that were to fail (for example with Broken Pipe, or Connection Refused), disconnect both API and Stats (in the vpp.disconnect() call, also invalidate the interface and LCP cache), and then fail the update. The Agent runner will then retry once per second until the connection (and control_ping()) succeeds. TESTED: - Start vpp-snmp-agent, it connects and starts up per normal. - Exit / Kill vpp - Upon the next update(), the control_ping() call will fail, causing the agent to disconnect - The agent will now loop: [ERROR ] agentx.agent - update : VPP API: [Errno 1] Sendall error: BrokenPipeError(32, 'Broken pipe'), retrying [WARNING ] agentx.agent - run : Update failed, last successful update was 1673345631.7658572 [INFO ] agentx.vppapi - connect : Connecting to VPP [ERROR ] agentx.agent - update : VPP API: Not connected, api definitions not available, retrying - Start VPP again, when its API endpoint is ready: [INFO ] agentx.vppapi - connect : Connecting to VPP [INFO ] agentx.vppapi - connect : VPP version is 23.02-rc0~199-gcfaf44020 [INFO ] agentx.vppapi - connect : Enabling VPP API interface events [DEBUG ] agentx.agent - update : VPP API: control_ping_reply(_0=24, context=12, retval=0, client_index=0, vpe_pid=705326) [INFO ] agentx.vppapi - get_ifaces : Requesting interfaces from VPP API [INFO ] agentx.vppapi - get_lcp : Requesting LCPs from VPP API - The agent resumes where it left off	2023-01-10 11:24:44 +01:00
Pim van Pelt	b6864530eb	Update README.md	2023-01-08 23:11:55 +01:00
Pim van Pelt	7f4427c4b6	Improvement: Use interface/LCP caching on VPP API - Set an initial vppapi.iface_dict and lcp_dict to None. - Set an event watcher API call, with a callback - When events happen, flush the iface/lcp cache (by setting them to None). - When get_ifaces / get_lcp sees an empty cache, fetch the data from VPP API and put into the cache for subsequent calls. This way, the VPP API is only used upon startup (when the caches are empty), and on interface add/del/changes (note: the events fire for link, and admin up/down, but not for MTU changes). One small race condition exists: if a new LCP is created, this does not trigger an interface event. Adding a want_lcp_events() makes sense, but until then, a few options remain: 0) race exists only if inerface was created; THEN the cache was refreshed; and THEN the LCP was created. 1) create the lcp and then force a change to any interface (this will create an sw_interface event and flush the cache) 2) restart vpp-snmp-agent	2023-01-08 13:57:08 +01:00
Pim van Pelt	c81a035091	Refactor to use VPPApiJSONFiles	2023-01-08 13:24:54 +01:00
Pim van Pelt	fe794ed286	Remove all global variables	2023-01-08 13:21:00 +01:00
Pim van Pelt	5e11539b44	Format with black	2023-01-08 13:05:42 +01:00
Pim van Pelt	72e9cf3503	Update README.md	2022-12-23 16:01:02 +01:00
Pim van Pelt	a56840d849	Update README.md	2022-12-23 10:20:11 +01:00
Pim van Pelt	cde5d4df94	Update README.md	2022-12-23 10:09:06 +01:00
Pim van Pelt	16c29e0ce6	Allow vppapi!=vppstats count, continue and use those interfaces that are in the API	2022-07-10 20:49:44 +00:00
Pim van Pelt	b4c819af87	Retrieve description from all interface types, not just ethernets	2022-07-10 11:33:47 +00:00
Pim van Pelt	b024a3e96b	Move the YAML config to be compatible with vppcfg's config file	2022-07-10 09:47:33 +00:00
Pim van Pelt	3be732e6ab	Remove the workaround for endianness in VPP; Remove the --disable-lcp flag. Catch connect exceptions for VPPStats and VPP API	2022-07-09 10:14:15 +00:00
Pim van Pelt	c9233749bc	Pulled in latest vpp_stats.py from upstream after https://gerrit.fd.io/r/c/vpp/+/35640	2022-04-01 13:10:17 +00:00
Pim van Pelt	968c0abe2f	Fail the setup if we can't connect to VPP; exit the daemon with non-zero value to force restart by systemd	2022-03-14 23:14:59 +00:00
Pim van Pelt	86512dd66b	Turn interface mismatch into a warning - it is often recoverable	2022-03-13 12:05:54 +00:00
Pim van Pelt	c112016665	Add a flag to disable lcp lookups, due to pending VAPI issues (https://gerrit.fd.io/r/c/vpp/+/35479 )	2022-03-08 13:25:29 +00:00
Pim van Pelt	a9c9e15828	typo fix	2022-02-27 23:01:19 +00:00
Pim van Pelt	f97f50bf30	Update README	2022-02-27 22:59:55 +00:00
Pim van Pelt	c319ef576d	Add an optional configuration file A simple convenience configfile can provide a mapping between VPP interface names, Linux Control Plane interface names, and descriptions. An example: ``` interfaces: "TenGigabitEthernet6/0/0": description: "Infra: xsw0.chrma0:2" lcp: "xe1-0" "TenGigabitEthernet6/0/0.3102": description: "Infra: QinQ to Solnet for Daedalean" lcp: "xe1-0.3102" "TenGigabitEthernet6/0/0.310211": description: "Cust: Daedalean IP Transit" lcp: "xe1-0.3102.11" ``` This configuration file is completely optional. If the `-c` flag is empty, or it's set but the file does not exist, the Agent will simply enumerate all interfaces, and set the `ifAlias` OID to the same value as the `ifName`. However, if the config file is read, it will change the behavior as follows: * Any `tapNN` interface names from VPP will be matched to their PHY by looking up their Linux Control Plane interface. The `ifName` field will be rewritten to the _LIP_ `host-if`. For example, `tap3` above will become `xe1-0` while `tap3.310211` will become `xe1-0.3102.11`. * The `ifAlias` OID for a PHY will be set to the `description` field. * The `ifAlias` OID for a TAP will be set to the string `LCP: ` followed by its PHY `ifName`. For example, `xe1-0.3102.11` will become `LCP TenGigabitEthernet6/0/0.310211 (tap9)`	2022-02-27 22:58:03 +00:00
Pim van Pelt	80190bf2d0	Merge pull request #1 from amartin-git/patch-1 Set larger receive buffer size for bulk requests	2021-12-06 18:30:29 +01:00
amartin-git	c19df5a77a	Set larger receive buffer size for bulk requests When using SNMP BULK GET requests (from Zabbix in our case), the default value of 1024 truncates the request, resulting in malformed requests reaching the agent. Using an 8K buffer fixes this. A better approach perhaps would be to process the buffer using a loop.	2021-12-06 12:26:22 -05:00
Pim van Pelt	89abebb26b	Merge branch 'main' of github.com:pimvanpelt/vpp-snmp-agent into main	2021-09-15 08:02:26 +00:00
Pim van Pelt	18005bbbc2	Fix memory leak in logging (specifically: do not create a new logger for every SNMP PDU)	2021-09-15 07:58:08 +00:00
Pim van Pelt	a574305fb2	Reconnect faster after errors (0.1s sleep)	2021-09-15 07:57:17 +00:00
Pim van Pelt	09a2b6e9e4	Remove logger from dataset, it's not necessary, as there's only one call location that wants to say something. Turn that into an exception instead	2021-09-15 07:56:50 +00:00
Pim van Pelt	bf9d61b95d	Restart snmpd if it fails	2021-09-13 07:54:13 +02:00
Pim van Pelt	610d03a14b	Refactor README.md	2021-09-12 16:31:13 +00:00
Pim van Pelt	5051ab32ce	Update README with the -h/--help argparse hint	2021-09-12 16:22:22 +00:00
Pim van Pelt	6d0ed88722	Add argparse and a few useful arguments Now that we're explicitly connecting via TCP to localhost:705 (which can be overriden by the -a flag), we no longer need to run as root. Therefore, update vpp-snmp-agent.service to run as user Debian-snmp group vpp, so that /run/vpp/{api,stats}.sock are writable. Be explicit on the commandline arguments in the service definition.	2021-09-12 16:19:33 +00:00
Pim van Pelt	7206d92f40	Move all loggers to be members of the class, not global objects	2021-09-12 16:08:35 +00:00
Pim van Pelt	9265e211e3	Swap oper/admin status (they were the wrong way around)	2021-09-12 14:09:23 +00:00
Pim van Pelt	96f2a3b4b3	Move to /usr/sbin instead of /usr/local/sbin	2021-09-11 12:55:28 +00:00
Pim van Pelt	c72890868c	s/freq/period/ to be more precies; Set default period to 30s; set wait period on reconnect to 10s; Add explicit INFO logline when replacing dataset	2021-09-11 12:45:28 +00:00
Pim van Pelt	8c9c1e2b4a	Replace the pyagentx threaded version with a much simpler, non-threaded version.	2021-09-11 12:19:38 +00:00
Pim van Pelt	842bce9d6e	Add server_address to initializer, allow for unix path (starts with /) or hostname:port address	2021-09-11 08:13:21 +00:00
Pim van Pelt	0c0e4fc14a	A better way to specify netns See docs: https://www.freedesktop.org/software/systemd/man/systemd.exec.html#NetworkNamespacePath=	2021-09-05 21:02:11 +00:00
Pim van Pelt	184d2eceb2	Restart agent on failure	2021-09-05 20:26:47 +00:00
Pim van Pelt	9b39aa61c2	Clamp all COUNTER32 at mod 2^32	2021-09-05 20:15:06 +00:00
Pim van Pelt	7dec1329d2	Turn VPPApi into a threadsafe object It now is tolerant to VPP restarts. Upon initialization, we connect(), blocking all but the first thread from trying. The rest will see self.connected=True and move on. Then, on each/any error, call vpp.disconect() and set connected=False which will make any subsequent AgentX updater run force a reconnect.	2021-09-05 20:02:11 +00:00
Pim van Pelt	e1cddc8c26	Add VPP API support to retrieve mtu/ifspeed/operstatus/adminstatus/mac	2021-09-05 19:39:20 +00:00
Pim van Pelt	238471d25f	Ensure more updates can fit in the queue, allow scaling to 20 variables on 1000 interfaces	2021-09-05 18:23:23 +00:00
Pim van Pelt	2e7aa607e4	Add most of the standard (32bit) ifTable.ifEntry MIB, the 5 that are left will require vpp_papi support, coming next	2021-09-05 18:12:02 +00:00
Pim van Pelt	ac8c323abf	Ensure VPPStat() is connected before each read; if VPP restarts, we'll lose the connection, and this ensures that once VPP comes back up, we'll re-connect to it seemlessly	2021-09-05 16:19:44 +00:00

1 2

57 Commits