Pim van Pelt 95d96d5e61 bugfix: add a control_ping() before each update
If VPP were to disconnect either the Stats Segment or the API endpoint,
for example if it crashes and restarts, vpp-snmp-agent will not detect
this. In such a situation, it will hold on to the stale stats and no
longer receive interface updates.

Before each run, send a control_ping() API request, and if that were to
fail (for example with Broken Pipe, or Connection Refused), disconnect
both API and Stats (in the vpp.disconnect() call, also invalidate the interface
and LCP cache), and then fail the update. The Agent runner will then retry
once per second until the connection (and control_ping()) succeeds.

TESTED:
- Start vpp-snmp-agent, it connects and starts up per normal.
- Exit / Kill vpp
- Upon the next update(), the control_ping() call will fail, causing the
  agent to disconnect
- The agent will now loop:
[ERROR   ]      agentx.agent - update         : VPP API: [Errno 1] Sendall error: BrokenPipeError(32, 'Broken pipe'), retrying
[WARNING ]      agentx.agent - run            : Update failed, last successful update was 1673345631.7658572
[INFO    ]     agentx.vppapi - connect        : Connecting to VPP
[ERROR   ]      agentx.agent - update         : VPP API: Not connected, api definitions not available, retrying

- Start VPP again, when its API endpoint is ready:
[INFO    ]     agentx.vppapi - connect        : Connecting to VPP
[INFO    ]     agentx.vppapi - connect        : VPP version is 23.02-rc0~199-gcfaf44020
[INFO    ]     agentx.vppapi - connect        : Enabling VPP API interface events
[DEBUG   ]      agentx.agent - update         : VPP API: control_ping_reply(_0=24, context=12, retval=0, client_index=0, vpe_pid=705326)
[INFO    ]     agentx.vppapi - get_ifaces     : Requesting interfaces from VPP API
[INFO    ]     agentx.vppapi - get_lcp        : Requesting LCPs from VPP API

- The agent resumes where it left off
2023-01-10 11:24:44 +01:00
2023-01-08 13:21:00 +01:00
2021-09-05 15:13:12 +00:00
2021-09-05 15:13:12 +00:00
2021-09-05 15:13:12 +00:00
2021-09-12 16:31:13 +00:00
2022-12-23 16:01:02 +01:00
2023-01-08 13:05:42 +01:00

VPP's Interface AgentX

This is an SNMP agent that implements the Agentx protocol. It connects to VPP's statseg (statistics memory segment) by MMAPing it, so the user running the agent must have read access to /run/vpp/stats.sock. It also connects to VPP's API endpoint, so the user running the agent must have read/write access to /run/vpp/api.sock. Both of these are typically accomplished by running the agent as group vpp.

The agent connects to SNMP's agentx socket, which can be either a TCP socket (by default localhost:705), or a unix domain socket (by default /var/agentx/master) the latter being readable only by root. It's preferable to run as unprivileged user, so a TCP socket is preferred (and the default).

The agent incorporates a refactored/modified pyagentx. The upstream pyagentx code uses a threadpool and message queue, but it was not very stable. Often, due to lack of proper locking, updaters would overwrite parts of the MIB and as a result, any reads that were ongoing would abruptly be truncated. I refactored the code to be single-threaded, greatly simplifying the design (and eliminating the need for locking).

To respect the original authors, this code is released with the same BSD 2-clause license.

Building

Install pyinstaller to build a binary distribution

sudo pip install pyinstaller
pyinstaller vpp-snmp-agent.py  --onefile

## Run it on console
dist/vpp-snmp-agent -h
usage: vpp-snmp-agent [-h] [-a ADDRESS] [-p PERIOD] [-d]

optional arguments:
  -h, --help  show this help message and exit
  -a ADDRESS  Location of the SNMPd agent (unix-path or host:port), default localhost:705
  -p PERIOD   Period to poll VPP, default 30 (seconds)
  -c CONFIG   Optional vppcfg YAML configuration file, default empty
  -d          Enable debug, default False

## Install
sudo cp dist/vpp-snmp-agent /usr/sbin/

Configuration file

This SNMP Agent will read a vppcfg configuration file, which provides a mapping between VPP interface names, Linux Control Plane interface names, and descriptions. From the upstream vppcfg configuration file, it will only consume the interfaces block, and ignore the rest. An example snippet:

interfaces:
  GigabitEthernet3/0/0:
    description: "Infra: Some interface"
    lcp: e0
    mtu: 9000
    sub-interfaces:
      100:
        description: "Cust: Some sub-interface"
      200:
        description: "Cust: Some sub-interface with LCP"
        lcp: e0.200
      20011:
        description: "Cust: Some QinQ sub-interface with LCP"
        encapsulation:
          dot1q: 200
          inner-dot1q: 11
          exact-match: true
        lcp: e0.200.11

This configuration file is completely optional. If the -c flag is empty, or it's set but the file does not exist, the Agent will simply enumerate all interfaces, and set the ifAlias OID to the same value as the ifName. However, if the config file is read, it will change the behavior as follows:

  • The ifAlias OID for an interface will be set to the description field.
  • Any tapNN interface names from VPP will be matched to their PHY by looking up their Linux Control Plane interface:
    • The ifName field will be rewritten to the LIP host-if, which is specified by the lcp field. For example, tap3 above will become e0 while tap3.20011 will become e0.200.11.
    • The ifAlias OID for a TAP will be set to the string LCP followed by its PHY ifName. For example, e0.200.11 will become LCP GigabitEthernet3/0/0.20011 (tap3)

SNMPd config

First, configure the snmpd to accept agentx connections by adding (at least) the following to snmpd.conf:

master  agentx
agentXSocket tcp:localhost:705,unix:/var/agentx-dataplane/master

and restart snmpd to pick up the changes. Simply run ./vpp-snmp-agent.py and it will connect to the snmpd on localhost:705, and expose the IFMib by periodically polling VPP. Observe the console output.

Running in production

Meant to be run on Ubuntu, copy *.service, disable the main snmpd, enable the one that runs in the dataplane network namespace and start it all up:

sudo cp netns-dataplane.service /usr/lib/systemd/system/
sudo cp snmpd-dataplane.service /usr/lib/systemd/system/
sudo cp vpp-snmp-agent.service /usr/lib/systemd/system/
sudo systemctl daemon-reload
sudo systemctl stop snmpd
sudo systemctl disable snmpd
sudo systemctl enable netns-dataplane
sudo systemctl start netns-dataplane
sudo systemctl enable snmpd-dataplane
sudo systemctl start snmpd-dataplane
sudo systemctl enable vpp-snmp-agent
sudo systemctl start vpp-snmp-agent

Support

Limnited support is offered on this codebase. GitHub issues can be files for issues with the design or implementation (eg bugs, feature requests), but no user support can be given. Put simply, this repo accepts only bugreports with the code, not with its use.

Issues with the codebase that are well researched (this article gives a good example of the expectation), preferably pointing at the location where the problem occurred, and if possible proposing a fix, are most welcome.

Requests that are not discussing problems with the software itself, notably enduser support requests, will not be handled unless they clearly demonstrate a bug and propose workarounds or fixes.

Paid support can be obtained on hourly commission. Reach out \to IPng Networks (sales@ipng.ch) to discuss rates.

Description
An SNMP Agent for VPP written in Python3
Readme 105 KiB
Languages
Python 100%