vpp-maglev/tests/01-maglevd/01-healthcheck.robot

*** Settings ***
Library             OperatingSystem
Resource            ../common.robot

Suite Setup         Setup Suite
Suite Teardown      Cleanup Suite


*** Variables ***
${lab-name}         maglevd-test
${lab-file}         maglevd-lab/maglevd.clab.yml
${runtime}          docker
${MAGLEVD_NODE}     clab-maglevd-test-maglevd
${METRICS_URL}      http://172.20.30.2:9091/metrics


*** Test Cases ***
Deploy maglevd-test lab
    [Documentation]    Deploy the containerlab topology. The maglevd node starts
    ...    automatically as PID 1 via start.sh and begins probing the nginx
    ...    backends immediately.
    ${rc}    ${output} =    Run And Return Rc And Output
    ...    ${CLAB_BIN} --runtime ${runtime} deploy -t ${CURDIR}/${lab-file}
    Log    ${output}
    Should Be Equal As Integers    ${rc}    0
    Sleep    3s    Wait for nginx containers and probes to converge

All backends reach up state
    [Template]    Backend Should Be Up
    nginx1
    nginx2
    nginx3

Health checks are reaching all backends
    [Template]    Probe Count Should Be Positive
    nginx1
    nginx2
    nginx3

Pause backend stops probing
    Maglevc    set backend nginx1 pause
    Backend Should Have State    nginx1    paused
    Sleep    1s
    ${before} =    Get Probe Count    nginx1
    Sleep    2s    Wait to confirm no new probes arrive
    ${after} =    Get Probe Count    nginx1
    Should Be True    ${after} == ${before}
    ...    Probe count for nginx1 grew while paused: ${before} → ${after}

Resume backend restarts probing
    Maglevc    set backend nginx1 resume
    ${before} =    Get Probe Count    nginx1
    Sleep    2s    Wait for resumed probes to accumulate
    ${after} =    Get Probe Count    nginx1
    Should Be True    ${after} > ${before}
    ...    Probe count for nginx1 did not grow after resume: ${before} → ${after}
    Wait Until Keyword Succeeds    5s    500ms
    ...    Backend Should Be Up    nginx1

Disable backend stops probing
    Maglevc    set backend nginx2 disable
    Backend Should Have State    nginx2    disabled
    Backend Should Be Disabled    nginx2
    Sleep    1s
    ${before} =    Get Probe Count    nginx2
    Sleep    2s    Wait to confirm probes stopped
    ${after} =    Get Probe Count    nginx2
    Should Be True    ${after} == ${before}
    ...    Probe count for nginx2 grew while disabled: ${before} → ${after}

Enable backend restarts probing
    Maglevc    set backend nginx2 enable
    ${before} =    Get Probe Count    nginx2
    Sleep    2s    Wait for re-enabled probes to accumulate
    ${after} =    Get Probe Count    nginx2
    Should Be True    ${after} > ${before}
    ...    Probe count for nginx2 did not grow after enable: ${before} → ${after}
    Wait Until Keyword Succeeds    5s    500ms
    ...    Backend Should Be Up    nginx2


Prometheus endpoint is reachable
    ${rc}    ${output} =    Run And Return Rc And Output
    ...    curl -sf ${METRICS_URL}
    Log    ${output}
    Should Be Equal As Integers    ${rc}    0
    Should Contain    ${output}    maglev_backend_state

Prometheus reports all backends up
    ${output} =    Scrape Metrics
    # Each backend should have state="up" = 1.
    Should Contain    ${output}    maglev_backend_state{address="172.20.30.11",backend="nginx1",healthcheck="http-check",state="up"} 1
    Should Contain    ${output}    maglev_backend_state{address="172.20.30.12",backend="nginx2",healthcheck="http-check",state="up"} 1
    Should Contain    ${output}    maglev_backend_state{address="172.20.30.13",backend="nginx3",healthcheck="http-check",state="up"} 1

Prometheus reports probe counters
    ${output} =    Scrape Metrics
    Should Match Regexp    ${output}    maglev_probe_total\\{backend="nginx1".*result="success".*\\}\\s+[1-9]
    Should Match Regexp    ${output}    maglev_probe_total\\{backend="nginx2".*result="success".*\\}\\s+[1-9]
    Should Match Regexp    ${output}    maglev_probe_total\\{backend="nginx3".*result="success".*\\}\\s+[1-9]

Prometheus reports probe duration histogram
    ${output} =    Scrape Metrics
    Should Match Regexp    ${output}    maglev_probe_duration_seconds_count\\{backend="nginx1".*\\}\\s+[1-9]

Prometheus reports pool weights
    ${output} =    Scrape Metrics
    Should Contain    ${output}    maglev_frontend_pool_backend_weight{backend="nginx1",frontend="http-vip",pool="primary"} 100
    Should Contain    ${output}    maglev_frontend_pool_backend_weight{backend="nginx3",frontend="http-vip",pool="fallback"} 100

Prometheus reports transition counters
    ${output} =    Scrape Metrics
    # All backends transitioned unknown → up during startup.
    Should Match Regexp    ${output}    maglev_backend_transitions_total\\{backend="nginx1",from="unknown",to="up"\\}\\s+[1-9]


# ---- pool failover tests ----------------------------------------------------
#
# These tests use the static failover-vip frontend defined in maglev.yaml:
# one backend in the primary pool (static-primary) and one in the fallback
# pool (static-fallback). Both have no healthcheck, so they're always in
# state=up. Because the effective weight is computed from the pool-failover
# logic (and not from probes), these tests are deterministic and don't
# depend on timing or a running VPP.

Failover: primary up, secondary standby
    Wait Until Keyword Succeeds    3s    200ms
    ...    Static Backend Should Be Up    static-primary
    Wait Until Keyword Succeeds    3s    200ms
    ...    Static Backend Should Be Up    static-fallback
    Effective Weight Should Be    failover-vip    static-primary     100
    Effective Weight Should Be    failover-vip    static-fallback    0

Failover: disable primary → fallback takes over
    Maglevc    set backend static-primary disable
    Backend Should Have State    static-primary    disabled
    Effective Weight Should Be    failover-vip    static-primary     0
    Effective Weight Should Be    failover-vip    static-fallback    100

Failover: enable primary → fallback steps back
    Maglevc    set backend static-primary enable
    Wait Until Keyword Succeeds    3s    200ms
    ...    Static Backend Should Be Up    static-primary
    Effective Weight Should Be    failover-vip    static-primary     100
    Effective Weight Should Be    failover-vip    static-fallback    0


*** Keywords ***
Setup Suite
    ${arch} =    Run    go env GOARCH
    Set Suite Variable    ${ARCH}    ${arch}

Cleanup Suite
    Run    docker logs ${MAGLEVD_NODE} > ${EXECDIR}/tests/out/maglevd.log 2>&1
    Run    ${CLAB_BIN} --runtime ${runtime} destroy -t ${CURDIR}/${lab-file} --cleanup

Maglevc
    [Documentation]    Run a maglevc command inside the maglevd container.
    [Arguments]    ${cmd}
    ${rc}    ${output} =    Run And Return Rc And Output
    ...    docker exec ${MAGLEVD_NODE} /opt/maglev/build/${ARCH}/maglevc --color\=false ${cmd}
    Log    ${output}
    Should Be Equal As Integers    ${rc}    0
    RETURN    ${output}

Backend Should Be Up
    [Arguments]    ${name}
    ${output} =    Maglevc    show backends ${name}
    Should Match Regexp    ${output}    state\\s+up

Backend Should Have State
    [Arguments]    ${name}    ${expected_state}
    ${output} =    Maglevc    show backends ${name}
    Should Match Regexp    ${output}    state\\s+${expected_state}

Backend Should Be Disabled
    [Arguments]    ${name}
    ${output} =    Maglevc    show backends ${name}
    Should Match Regexp    ${output}    enabled\\s+false

Get Probe Count
    [Documentation]    Return the number of HTTP health-check requests seen in a backend's nginx log.
    [Arguments]    ${name}
    ${output} =    Run    docker logs clab-${lab-name}-${name} 2>/dev/null | grep -c "GET /" || echo 0
    ${count} =    Convert To Integer    ${output.strip()}
    RETURN    ${count}

Probe Count Should Be Positive
    [Arguments]    ${name}
    ${count} =    Get Probe Count    ${name}
    Should Be True    ${count} > 0
    ...    No health-check requests found in nginx logs for ${name}

Scrape Metrics
    [Documentation]    Fetch the Prometheus /metrics endpoint from the maglevd container.
    ${rc}    ${output} =    Run And Return Rc And Output
    ...    curl -sf ${METRICS_URL}
    Should Be Equal As Integers    ${rc}    0
    RETURN    ${output}

Static Backend Should Be Up
    [Documentation]    Like Backend Should Be Up but for backends without a
    ...    healthcheck (hop straight to up via the synthetic-pass path).
    [Arguments]    ${name}
    ${output} =    Maglevc    show backends ${name}
    Should Match Regexp    ${output}    state\\s+up

Effective Weight Should Be
    [Documentation]    Parse 'show frontends <fe>' output for the named
    ...    backend and assert its effective weight matches the expected value.
    ...    Backend rows have the form:
    ...        [backends  ]<name>  weight <cfg>  effective <eff>
    ...    so we match on <name> followed by 'weight N effective E' anywhere
    ...    on a single line.
    [Arguments]    ${frontend}    ${backend}    ${expected}
    ${output} =    Maglevc    show frontends ${frontend}
    Should Match Regexp    ${output}
    ...    ${backend}\\s+weight\\s+\\d+\\s+effective\\s+${expected}\\b
    ...    backend ${backend}: expected effective weight ${expected} in:\n${output}