Skip to content
LinkState
Go back

When a hidden bridge loop outruns storm control

Introduction to Layer 2 Loops and Broadcast Saturation

A Layer 2 loop occurs when there are multiple paths for traffic to travel between two devices on the same network, causing packets to loop back and forth indefinitely. This can happen when there are redundant connections between switches, bridges, or other network devices.

Definition of Layer 2 Loops

Layer 2 loops can be caused by a variety of factors, including misconfigured devices and malicious activity.

Causes of Broadcast Saturation

Broadcast saturation occurs when a network is overwhelmed with broadcast traffic, causing network congestion and degradation of performance.

Impact on Network Performance

Layer 2 loops and broadcast saturation can have a significant impact on network performance, causing packet loss, latency, and jitter. In severe cases, it can even cause network downtime and outages.

Network Architecture Overview

Linux Bridge Configuration

In our example, we are using Linux bridges to connect multiple network interfaces together. The bridges are configured using the brctl command:

# Create a new bridge
brctl addbr br0
# Add an interface to the bridge
brctl addif br0 eth0
# Show bridge information
brctl show

Hypervisor vSwitch Setup

We are also using a hypervisor vSwitch to connect multiple virtual machines together. The vSwitch is configured using the ovs-vsctl command:

# Create a new vSwitch
ovs-vsctl add-br vSwitch0
# Add an interface to the vSwitch
ovs-vsctl add-port vSwitch0 eth0
# Show vSwitch information
ovs-vsctl show

Interconnection of Bridges and vSwitch

The Linux bridges and hypervisor vSwitch are interconnected using network cables and trunk ports:

# Create a trunk port on the bridge
brctl addif br0 trunk0
# Create a trunk port on the vSwitch
ovs-vsctl add-port vSwitch0 trunk0
# Connect the trunk ports together
ip link set trunk0 up

The Accidental Layer 2 Loop

Initial Wiring Mistake

The accidental Layer 2 loop occurred when a network administrator mistakenly connected two trunk ports together, creating a redundant path for traffic to travel:

# Incorrectly connect two trunk ports together
ip link set trunk0 up
ip link set trunk1 up

Propagation of the Loop Across Bridges and vSwitch

The Layer 2 loop propagated across the bridges and vSwitch, causing traffic to loop back and forth indefinitely:

# Show bridge and vSwitch information
brctl show
ovs-vsctl show
# Observe the looping traffic
tcpdump -i eth0

Resulting Broadcast Saturation

The Layer 2 loop caused broadcast saturation, overwhelming the network with broadcast traffic and degrading network performance:

# Observe the broadcast traffic
tcpdump -i eth0 -n -vv -s 0 -c 100 -W 100
# Show network performance metrics
iftop -i eth0
nload -i eth0

Identifying the Issue

Symptoms of Layer 2 Loops

The symptoms of a Layer 2 loop include high CPU utilization on network devices, high memory utilization on network devices, packet loss and latency, and network congestion and degradation of performance.

Network Monitoring and Analysis Tools

We used network monitoring and analysis tools such as tcpdump, iftop, and nload to identify the issue:

# Use tcpdump to capture traffic
tcpdump -i eth0 -n -vv -s 0 -c 100 -W 100
# Use iftop to show network traffic
iftop -i eth0
# Use nload to show network performance metrics
nload -i eth0

CLI Commands for Troubleshooting

We used CLI commands such as brctl, ovs-vsctl, and ip link to troubleshoot the issue:

# Show bridge information
brctl show
# Show vSwitch information
ovs-vsctl show
# Show interface information
ip link show

Containment and Fix

Isolating the Affected Segment

We isolated the affected segment by shutting down the trunk ports and disconnecting the redundant path:

# Shut down the trunk ports
ip link set trunk0 down
ip link set trunk1 down
# Disconnect the redundant path
brctl delif br0 trunk0
ovs-vsctl del-port vSwitch0 trunk0

Breaking the Layer 2 Loop

We broke the Layer 2 loop by removing the redundant path and reconnecting the trunk ports correctly:

# Remove the redundant path
brctl delif br0 trunk1
ovs-vsctl del-port vSwitch0 trunk1
# Reconnect the trunk ports correctly
brctl addif br0 trunk0
ovs-vsctl add-port vSwitch0 trunk0

Verification of Containment

Monitoring Network Traffic

We monitored network traffic using tools such as iftop and nload to verify that the containment fix was effective:

# Use iftop to show network traffic
iftop -i eth0
# Use nload to show network performance metrics
nload -i eth0

Analyzing Bridge and vSwitch Logs

We analyzed the bridge and vSwitch logs to verify that the Layer 2 loop was broken and that traffic was no longer looping:

# Show bridge logs
brctl show
# Show vSwitch logs
ovs-vsctl show

Scaling Limitations and Considerations

Maximum Bridge and vSwitch Capacity

The maximum bridge and vSwitch capacity can be a limiting factor in large-scale networks.

Impact of Large-Scale Layer 2 Loops

Large-scale Layer 2 loops can have a significant impact on network performance.

Strategies for Preventing Similar Issues in Large Networks

To prevent similar issues in large networks, it is essential to implement redundant and failover mechanisms, use network monitoring and analysis tools, implement scalability and performance optimization techniques, and provide training and awareness for network administrators.

Preventative Measures and Best Practices

Regular Network Audits and Inspections

Regular network audits and inspections can help detect issues early and prevent Layer 2 loops and broadcast saturation.

Implementing Redundancy and Failover Mechanisms

Implementing redundancy and failover mechanisms can help prevent network downtime and outages in case of a Layer 2 loop or other issue.

Training and Awareness for Network Administrators

Providing training and awareness for network administrators can help prevent mistakes and ensure that they are equipped to handle issues quickly and effectively.

Advanced Troubleshooting Techniques

Using Spanning Tree Protocol (STP) to Detect Loops

STP can be used to detect Layer 2 loops and prevent broadcast saturation.

Analyzing Network Topology with Graphical Tools

Graphical tools such as graphviz can be used to analyze network topology and detect issues such as Layer 2 loops.

Example Code for Custom Network Monitoring Scripts

The following example code can be used to create a custom network monitoring script:

import subprocess

# Define a function to monitor network traffic
def monitor_traffic():
    # Use iftop to show network traffic
    subprocess.call(["iftop", "-i", "eth0"])
    # Use nload to show network performance metrics
    subprocess.call(["nload", "-i", "eth0"])

# Define a function to analyze bridge and vSwitch logs
def analyze_logs():
    # Show bridge logs
    subprocess.call(["brctl", "show"])
    # Show vSwitch logs
    subprocess.call(["ovs-vsctl", "show"])

# Call the functions
monitor_traffic()
analyze_logs()

Share this post on:

Previous Post
From flood and learn to proxy resolution
Next Post
Why only large DNS answers fail behind policy