Skip to content
LinkState
Go back

A control-plane workbench for stuck-state triage

Introduction to Operator Workbench

The operator workbench is a critical component in modern network operations, providing a centralized platform for network engineers to monitor, troubleshoot, and optimize network performance. In the context of OSPF (Open Shortest Path First) and IS-IS (Intermediate System to Intermediate System) routing protocols, the operator workbench plays a vital role in diagnosing stuck-state issues, which can significantly impact network availability and reliability.

Overview of OSPF and IS-IS Stuck-State Diagnosis

OSPF and IS-IS are link-state routing protocols that rely on the exchange of routing information between neighboring routers to maintain a consistent view of the network topology. However, in certain scenarios, the routing protocols can become stuck in an inconsistent state, leading to routing loops, black holes, or other connectivity issues. Diagnosing stuck-state issues requires a thorough understanding of the protocol-specific mechanisms, such as neighbor finite state machines (FSMs), database summaries, packet evidence, and recent configuration changes.

Requirements for Operator Workbench

To effectively diagnose OSPF and IS-IS stuck-state issues, the operator workbench must collect and display the following information:

Designing the Operator Workbench

The design of the operator workbench involves several key components, including data collection, storage, and retrieval, as well as user interface design.

Collecting Neighbor FSM State

OSPF Neighbor FSM State Collection

To collect OSPF neighbor FSM state, the operator workbench can utilize the OSPF protocol’s built-in mechanisms, such as the show ip ospf neighbor command.

show ip ospf neighbor

The operator workbench can parse the output of this command to extract the relevant information, such as the neighbor’s IP address, state, and dead timer.

IS-IS Neighbor FSM State Collection

Similarly, to collect IS-IS neighbor FSM state, the operator workbench can utilize the IS-IS protocol’s built-in mechanisms, such as the show isis adjacency command.

show isis adjacency

The operator workbench can parse the output of this command to extract the relevant information, such as the adjacency’s system ID, state, and hold timer.

Database Summaries

OSPF Database Summaries

To collect OSPF database summaries, the operator workbench can utilize the OSPF protocol’s built-in mechanisms, such as the show ip ospf database command.

show ip ospf database

The operator workbench can parse the output of this command to extract the relevant information, such as the number of links, nodes, and prefixes.

IS-IS Database Summaries

Similarly, to collect IS-IS database summaries, the operator workbench can utilize the IS-IS protocol’s built-in mechanisms, such as the show isis database command.

show isis database

The operator workbench can parse the output of this command to extract the relevant information, such as the number of links, nodes, and prefixes.

Packet Evidence Collection

OSPF Packet Capture

To collect OSPF packet evidence, the operator workbench can utilize packet capture tools, such as Wireshark or Tcpdump, to capture OSPF packets on the network.

tcpdump -i any -n -s 0 -W 100 -c 100 -w ospf_capture.pcap ospf

The operator workbench can then parse the captured packets to extract the relevant information, such as the packet type, source and destination IP addresses, and sequence numbers.

IS-IS Packet Capture

Similarly, to collect IS-IS packet evidence, the operator workbench can utilize packet capture tools, such as Wireshark or Tcpdump, to capture IS-IS packets on the network.

tcpdump -i any -n -s 0 -W 100 -c 100 -w isis_capture.pcap isis

The operator workbench can then parse the captured packets to extract the relevant information, such as the packet type, source and destination system IDs, and sequence numbers.

Recent Config Deltas Collection

OSPF Config Deltas

To collect OSPF config deltas, the operator workbench can utilize configuration management tools, such as Ansible or Puppet, to track changes to the OSPF configuration.

ansible -m ospf -a "state=present" -i inventory

The operator workbench can then parse the configuration changes to extract the relevant information, such as the changed configuration parameters and the timestamp of the change.

IS-IS Config Deltas

Similarly, to collect IS-IS config deltas, the operator workbench can utilize configuration management tools, such as Ansible or Puppet, to track changes to the IS-IS configuration.

ansible -m isis -a "state=present" -i inventory

The operator workbench can then parse the configuration changes to extract the relevant information, such as the changed configuration parameters and the timestamp of the change.

Implementation Details

The implementation of the operator workbench involves several key components, including data collection, storage, and retrieval, as well as user interface design.

Data Collection Mechanisms

The operator workbench can utilize APIs to collect data from various sources, such as network devices, configuration management tools, and packet capture tools.

import requests
url = "https://example.com/api/ospf/neighbor-fsm-state"
response = requests.get(url)
data = response.json()
print(data)

The operator workbench can also utilize CLI commands to collect data from various sources, such as network devices and configuration management tools.

operator-workbench --view neighbor-fsm-state

Data Storage and Retrieval

The operator workbench requires a database to store the collected data, such as neighbor FSM state, database summaries, packet evidence, and recent config deltas.

CREATE TABLE neighbor_fsm_state (
    id SERIAL PRIMARY KEY,
    neighbor_ip VARCHAR(255),
    state VARCHAR(255),
    dead_timer INTEGER
);

The operator workbench should provide data retrieval mechanisms, such as APIs or CLI commands, to retrieve the stored data.

operator-workbench --view neighbor-fsm-state --id 1

User Interface Design

The operator workbench should provide a web-based interface for network engineers to review and analyze the collected data.

<html>
    <body>
        <h1>Operator Workbench</h1>
        <table>
            <tr>
                <th>Neighbor IP</th>
                <th>State</th>
                <th>Dead Timer</th>
            </tr>
            <tr>
                <td>10.0.0.1</td>
                <td>Full</td>
                <td>40</td>
            </tr>
        </table>
    </body>
</html>

The interface should be user-friendly and provide features such as filtering, sorting, and searching.

Troubleshooting and Debugging

The operator workbench should provide troubleshooting and debugging mechanisms to help network engineers identify and resolve issues.

Common Issues with Operator Workbench

Data collection issues can occur due to various reasons, such as network connectivity problems or API errors. The operator workbench should provide mechanisms to detect and resolve these issues.

Troubleshooting Tools and Techniques

The operator workbench should provide log analysis tools to help network engineers identify and resolve issues.

operator-workbench --log --level debug

The operator workbench should also provide debugging APIs and CLI commands to help network engineers identify and resolve issues.

operator-workbench --debug --api

Example Troubleshooting Scenarios

To troubleshoot OSPF neighbor FSM state collection issues, the network engineer can use the show ip ospf neighbor command to verify the OSPF neighbor state.

show ip ospf neighbor

The engineer can also use the operator workbench’s log analysis tools to identify any errors or issues related to OSPF neighbor FSM state collection.

Scaling and Limitations

The operator workbench should be designed to scale horizontally and vertically to handle increasing amounts of data and traffic.

Scaling Operator Workbench

The operator workbench can be scaled horizontally by adding more nodes to the cluster. Each node can handle a portion of the data and traffic, and the nodes can communicate with each other to provide a unified view of the data.

Limitations of Operator Workbench

The operator workbench may have limitations related to data collection, such as the amount of data that can be collected, the frequency of data collection, and the sources of data.

Mitigating Scaling Limitations

The operator workbench can implement caching mechanisms to reduce the load on the database and improve performance.

import redis
redis_client = redis.Redis(host='localhost', port=6379, db=0)

The operator workbench can also optimize data collection and storage by reducing the amount of data collected, improving data compression, and using efficient storage mechanisms.

Security Considerations

The operator workbench should be designed with security in mind to protect the data and prevent unauthorized access.

Authentication and Authorization

The operator workbench should implement role-based access control to restrict access to authorized users and roles.

import os
os.environ["OPERATOR_WORKBENCH_ROLE"] = "admin"

The operator workbench should also implement authentication mechanisms, such as username and password, to verify the identity of users.

Data Encryption and Protection

The operator workbench should encrypt data in transit using secure protocols, such as HTTPS or SSH.

operator-workbench --encrypt-data --key example-key

The operator workbench should also encrypt data at rest using secure mechanisms, such as disk encryption or file-level encryption.

Example Security Configurations

The operator workbench should be designed and implemented with security in mind to protect the data and prevent unauthorized access. The security considerations should include authentication and authorization, data encryption and protection, and secure communication protocols.


Share this post on:

Previous Post
Line-by-line APIs and the illusion of atomic change
Next Post
A safe VXLAN-to-Geneve migration plan