Skip to content
LinkState
Go back

BGP State Mismatch in Containerlab

Analyzing BGP State Mismatch Inside a Containerlab Docker Fabric

Introduction to Containerlab and FRR

Containerlab is a network emulation platform that allows users to create complex network topologies using Docker containers. In this scenario, we’re using Containerlab to create a network fabric with multiple FRR (Free Range Routing) instances. Each FRR instance runs inside a Docker container, and they’re interconnected using Containerlab’s built-in networking capabilities.

Understanding BGP State Mismatch

BGP state mismatch occurs when there’s a discrepancy in the BGP session states between two or more FRR instances. This can be caused by various factors, including stale router-ids, duplicate routes, and log messages indicating router-id conflicts.

BGP Session States

When a BGP session is established, it goes through several states, including Idle, Connect, Active, and Established. If the session is not established, it can be stuck in one of the other states, indicating a problem.

Missing or Duplicate Routes

If the BGP session is not established, routes may not be advertised or received correctly, resulting in missing or duplicate routes in the routing tables.

Log Messages Indicating Router-ID Conflicts

FRR logs can indicate router-id conflicts, which can cause BGP sessions to fail. These logs can be checked using journalctl -u frr or by inspecting the container logs.

Packet Captures Showing Unexpected Keepalive/UPDATE Exchanges

Using tcpdump to capture packets on the Docker bridge or inside the containers can help identify unexpected keepalive or UPDATE exchanges, which can indicate a problem with the BGP session.

Common Causes of Stale Router-IDs in Cloned FRR Instances

Inheritance of Router-ID from Startup Config on Container Clone

When a new container is cloned from an existing one, it inherits the startup config, including the router-id. This can cause conflicts if the new container is not properly configured.

Manual or Automated Config Cloning without Router-ID Reset

If the config is cloned manually or automatically without resetting the router-id, it can cause conflicts between the containers.

Use of Static Router-ID Statements vs. Automatic Selection

Using static router-id statements can cause conflicts if the same ID is used in multiple containers. Automatic selection of router-id can help avoid this problem.

Docker Image Layering Preserving /etc/frr/frr.conf

Docker image layering can preserve the /etc/frr/frr.conf file, which contains the router-id. This can cause conflicts if the same image is used to create multiple containers.

Environment Variable Overrides Not Applied After Clone

If environment variable overrides are not applied after cloning, the container may inherit the wrong router-id.

Diagnostic Procedure

Verifying FRR Router-ID in Each Container

To verify the FRR router-id in each container, use the following commands:

vtysh -c "show ip bgp summary"
vtysh -c "show ip bgp neighbors"

Check the /etc/frr/frr.conf file for router-id statements:

cat /etc/frr/frr.conf | grep router-id

Inspecting Containerlab Node Definitions

Review the clab.yml file for kind: frr and config mounts:

nodes:
  node1:
    kind: frr
    config: |
      router-id 1.1.1.1

Check for config: or bind: sections that duplicate configs:

nodes:
  node1:
    kind: frr
    config: |
      router-id 1.1.1.1
    bind:
      - /etc/frr/frr.conf:/etc/frr/frr.conf

Comparing Router-ID Values Across Peers

Use a script to collect and diff the router-id outputs:

#!/bin/bash
for node in node1 node2 node3; do
  echo "Router-id for $node: $(vtysh -c "show ip bgp summary" | grep Router-ID)"
done

Analyzing BGP Logs for Router-ID Mismatch Warnings

Check the FRR logs for router-id mismatch warnings:

journalctl -u frr | grep "router-id mismatch"

Packet Capture Analysis

Use tcpdump to capture packets on the Docker bridge or inside the containers:

tcpdump -i docker0 -n -vv -s 0 -c 100 -W 1000 port 179

Filter for BGP TCP port 179 and observe the router-id in OPEN messages:

tcpdump -i docker0 -n -vv -s 0 -c 100 -W 1000 port 179 | grep "OPEN"

Remediation Steps

Resetting Router-ID on Affected FRR Instances

Remove the static router-id line and restart FRR:

sed -i '/router-id/d' /etc/frr/frr.conf
systemctl restart frr

Enable automatic router-id selection:

echo "router-id 0.0.0.0" >> /etc/frr/frr.conf
systemctl restart frr

Force a new router-id via bgp router-id <new-id> and clear ip bgp *:

vtysh -c "bgp router-id 2.2.2.2"
vtysh -c "clear ip bgp *"

Updating Containerlab Topology to Avoid Config Cloning

Use template: or config: with per-node variables:

nodes:
  node1:
    kind: frr
    template: |
      router-id {{ node_id }}

Leverage env: to pass unique router-id values:

nodes:
  node1:
    kind: frr
    env:
      ROUTER_ID: 1.1.1.1

Apply bind: mounts that point to node-specific config files:

nodes:
  node1:
    kind: frr
    bind:
      - /etc/frr/node1.conf:/etc/frr/frr.conf

Restarting BGP Sessions Cleanly

Use clear ip bgp * soft or reset ip bgp *:

vtysh -c "clear ip bgp * soft"
vtysh -c "reset ip bgp *"

Verify session reestablishment and route convergence:

vtysh -c "show ip bgp summary"
vtysh -c "show ip route"

Validation and Verification

Confirming Unique Router-IDs Across All FRR Peers

Use the diagnostic procedure to verify unique router-ids:

for node in node1 node2 node3; do
  echo "Router-id for $node: $(vtysh -c "show ip bgp summary" | grep Router-ID)"
done

Checking BGP State Transitions to Established

Verify BGP state transitions to Established:

vtysh -c "show ip bgp summary"

Verifying Route Symmetry and Absence of Duplicate Advertisements

Verify route symmetry and absence of duplicate advertisements:

vtysh -c "show ip route"

Monitoring for Recurring Router-ID Conflict Logs Over Time

Monitor FRR logs for recurring router-id conflict logs:

journalctl -u frr | grep "router-id mismatch"

Performing Traffic Flow Tests to Ensure Proper Forwarding

Perform traffic flow tests to ensure proper forwarding:

tcpdump -i docker0 -n -vv -s 0 -c 100 -W 1000 port 179

Preventive Measures and Best Practices

Centralized FRR Config Management with Templating

Use centralized FRR config management with templating to avoid config cloning issues.

Automating Router-ID Generation Based on Container Identifiers

Automate router-id generation based on container identifiers to ensure unique router-ids.

Incorporating Router-ID Checks into CI/CD Pipeline for Lab Builds

Incorporate router-id checks into the CI/CD pipeline to ensure unique router-ids during lab builds.

Documenting Cloning Procedures to Exclude Router-ID Inheritance

Document cloning procedures to exclude router-id inheritance and ensure unique router-ids.

Using Containerlab’s vars: Feature to Assign Unique Identifiers Per Node

Use Containerlab’s vars: feature to assign unique identifiers per node and ensure unique router-ids.

Regular Auditing of FRR Configurations in Running Labs

Regularly audit FRR configurations in running labs to ensure unique router-ids and detect potential issues.

Enabling FRR BGP Debug (debug bgp events) Only During Troubleshooting

Enable FRR BGP debug only during troubleshooting to avoid log noise and ensure efficient debugging.

Troubleshooting Checklist


Share this post on:

Previous Post
OSPF ExStart Loops from MTU Drift