NAT expiry that looks like chronic packet loss

Introduction to Retransmission Storms

Definition and Causes

A retransmission storm occurs when an endpoint repeatedly retransmits packets that are dropped because an intermediate stateful device (NAT, firewall, load‑balancer) has removed its translation or connection‑tracking entry while the flow is still active. The surviving host interprets the lack of ACK as loss and triggers TCP retransmission timers; if the state remains absent, an exponential back‑off loop ensues, consuming bandwidth and CPU.

Typical causes

Idle‑timeout expiration of NAT or conntrack entries while the application holds a TCP socket open (e.g., long‑lived DB connections, SSH tunnels, SIP media).
Asymmetric routing causing the return path to bypass the stateful device, so the forward direction sees state while the reverse does not.
Misaligned timeout values between endpoints and middleboxes (e.g., application keep‑alive interval longer than NAT TCP timeout).
Stateful device overload causing premature entry eviction (hash‑table limits, memory pressure).

Impact on Network Performance

Excess retransmits inflate link utilization; a single stalled flow can generate dozens of retries per second.
Increased latency for all traffic sharing the same queue due to bufferbloat from retransmit bursts.
CPU spikes on the stateful device as it processes repeated SYN/ACK or RST packets looking for missing state.
Application‑level timeouts and failed transactions, often mistaken for application bugs.
Potential denial‑of‑service if the storm saturates the uplink or overwhelms the stateful device’s forwarding path.

Understanding Translation State Expiration

Translation State Overview

Translation state is the data structure a middlebox creates to map an internal address:port to an external address:port (NAT) or to track a connection’s lifecycle (conntrack, stateful firewall). For TCP, the entry typically contains:

5‑tuple (src IP, dst IP, src port, dst port, protocol)
Sequence‑number window
Timestamp of last seen packet
Timeout value specific to TCP state (e.g., ESTABLISHED, FIN_WAIT)

Expiration Mechanisms and Timers

Most stateful implementations use idle timers that reset on each packet seen in either direction. When the timer expires, the entry is removed and any subsequent packet is treated as new or invalid.

Linux netfilter/conntrack

/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_established – default 432000 s (5 days) but can be lowered by administrators or container runtimes.
Separate timers for SYN_SENT, SYN_RECV, FIN_WAIT, TIME_WAIT, etc.
UDP has a much shorter default (nf_conntrack_udp_timeout = 30 s).
When an entry expires, the conntrack module returns NF_DROP for packets belonging to that tuple unless a new SYN creates a fresh entry.

Cisco ASA/FTD

timeout conn (default 1 h) for TCP, timeout udp (default 2 min).
timeout half-closed for TCP half‑close state.

Palo Alto

session timeout for TCP (default 3600 s) and UDP (default 60 s).
session aged-out logs when idle timeout triggers.

Juniper SRX

flow tcp-session (default 1800 s) and flow udp-session (default 60 s).

If no packet matches the tuple before the timer fires, the state is cleared. Subsequent packets from the endpoint that still believes the connection is alive are treated as out‑of‑state and are either dropped or, in the case of NAT, cause address translation failure leading to ICMP unreachable or TCP RST.

Effects of Idle Periods on Translation State

During an idle period, no packets reset the timer. If the idle interval exceeds the configured timeout, the translation/conntrack entry disappears. When the application resumes sending data:

The first packet may be a data segment (not SYN).
The stateful device sees an unknown tuple → drops the packet (or sends ICMP port‑unreachable).
The sender’s TCP stack, missing an ACK, retransmits after RTO (initially ~1 s, then exponential).
Because the state is still missing, each retransmission meets the same fate, creating a storm until either:
- The application aborts the connection, or
- A keep‑alive or new SYN arrives, creating a fresh state entry.

Identifying Retransmission Storms

Symptoms and Indicators

TCP duplicate ACKs and fast retransmits visible in packet captures.
Retransmission rate > 10 % of total TCP traffic on a flow (measured via ss -i or netstat -s).
ICMP destination unreachable (port unreachable) or ICMP time exceeded spikes from the stateful device.
CPU utilization on NAT/firewall spikes correlating with bursty traffic.
Application logs showing “connection reset by peer” or “operation timed out” after periods of inactivity.
Retransmission storm detection in IDS/IPS (e.g., Snort rule ET POLICY TCP Retransmission Storm).

Diagnostic Tools and Techniques

Tool	Usage	What to Look For
`tcpdump -i any -nn -s0 -w /tmp/storm.pcap ‘tcp[tcpflags] & (tcp-syn	tcp-ack	tcp-rst) != 0’`
`ss -ti state established '( dport = :22 or sport = :22 )'`	Show per‑socket TCP info	`retransmits` field rising rapidly.
`conntrack -L -p tcp --dport 22`	List conntrack entries for SSH	Entries disappearing after idle period; `timeout` field near zero.
`iptables -L -v -n -t nat`	View NAT counters	`pkts` increasing on `MASQUERADE` but `bytes` low due to drops.
`nft list ruleset`	nftables equivalent	Same as above.
`tcpick -C -yP -r /tmp/storm.pcap`	Re‑assemble streams	Application data missing after idle gap.
`ethtool -S eth0`	NIC stats	Rising `tx_retransmits` or `rx_drop`.
`prometheus node_exporter` + `netstat` alerts	Long‑term monitoring	Alert on `node_tcp_retransmits_total` rate > threshold.

Log Analysis and Error Messages

Linux kernel (dmesg or /var/log/kern.log):
nf_ct_ftp: dropping packet proto=TCP src=10.0.0.5 dst=203.0.113.10 sport=54321 dport=22 state=INVALID
Indicates conntrack saw a packet for a non‑existent entry.
Cisco ASA:
%ASA-4-106015: Deny TCP (no connection) from 10.0.0.5/54321 to 203.0.113.10/22 flags=ACK on interface outside
Shows ACK received without existing connection.
Palo Alto:
session end reason: aged-out followed by session end reason: retransmission timeout
Correlates idle timeout with subsequent retransmits.
Juniper SRX:
flow_session_timeout: TCP session timed out
Look for bursts of these messages coinciding with retransmission spikes.

Troubleshooting Retransmission Storms

Step‑by‑Step Troubleshooting Process

Confirm the symptom – Capture traffic on both sides of the stateful device; verify retransmits occur only after an idle gap.
Locate the stateful device – Identify where NAT/conntrack/firewall sits (traceroute, ip route get, or ACL logs).
Check timeout values – Retrieve the relevant idle timers (see section 4). Compare with observed idle period.
Correlate logs – Match timestamp of state expiration log with first retransmit.
Validate symmetry – Ensure forward and reverse paths traverse the same stateful node (check for asymmetric routing, ECMP, or policy‑based routing).
Test with a keep‑alive – Send a minimal packet (e.g., TCP zero‑window probe or application‑level keep‑alive) shorter than the timeout; observe if storm disappears.
Adjust or workaround – Increase timeout, enable TCP keep‑alives on hosts, or implement idle‑timeout bypass (e.g., iptables -t raw -I PREROUTING -p tcp --dport 22 -j NOTRACK for specific flows).
Verify – Repeat capture; retransmits should drop to baseline (< 1 %).
Document – Record original and new timeout values, reason for change, and any side effects.

Common Causes and Solutions

Cause	Symptom	Fix
NAT TCP timeout too short (e.g., 30 s)	Storm after ~30 s idle	Increase `/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_established` to match application keep‑alive or raise to ≥ 1 h.
Application lacks keep‑alive	Storm after any idle > timeout	Enable TCP keep‑alive (`net.ipv4.tcp_keepalive_time=60`, `net.ipv4.tcp_keepalive_intvl=10`, `net.ipv4.tcp_keepalive_probes=6`) or use application‑level keep‑alive.
Asymmetric routing causing state loss on return path	Only outbound retransmits, inbound ACKs arrive	Symmetrize routing (static routes, policy‑based routing, or disable ECMP for affected flows).
Stateful device overload dropping entries early	Storm under high connection count	Increase conntrack hash size (`net.netfilter.nf_conntrack_max`) or upgrade hardware; enable `nf_conntrack_expect_max` if needed.
Mis‑matched UDP timeout (e.g., 5 s) for media streams	Storm on RTP silence periods	Raise UDP timeout (`nf_conntrack_udp_timeout`) or enable `udp timeout never` on firewall for media ports.

Advanced Troubleshooting Techniques

eBPF tracing – Use bpftrace to trace nf_ct_expire and nf_ct_delete events:

bpftrace -e 'tracepoint:netfilter:nf_ct_expire { printf("%s %lu->%lu expired\n", comm, ntohs(args->tuple.src.u3.all), ntohs(args->tuple.dst.u3.all)); }'

TCPInfo sysctl – Dump per‑socket TCPInfo via /proc/<pid>/fd/ and getsockopt(TCP_INFO) to see if tcpi_retransmits climbs while tcpi_state stays TCP_ESTABLISHED.
Packet generator – Use hping3 or nemesis to simulate idle periods and verify storm threshold:
```
hping3 -S -p 22 -i u1000000 203.0.113.10   # 1‑second interval SYN, no data
```
Conntrack expectations – For FTP/SIP, ensure helper expectations are not timing out prematurely (nf_conntrack_expect_max).

Configuring Translation State Timeout Values

Overview of Timeout Values and Settings

Timeout values dictate how long a state entry survives without seeing a packet. They are protocol‑ and state‑specific. Adjusting them prevents premature expiration while balancing memory usage.

Key knobs (Linux)

net.netfilter.nf_conntrack_tcp_timeout_established – ESTABLISHED state.
net.netfilter.nf_conntrack_tcp_timeout_time_wait – TIME_WAIT.
net.netfilter.nf_conntrack_udp_timeout – UDP.
net.netfilter.nf_conntrack_udp_timeout_stream – UDP seen as a stream (e.g., SIP).
net.netfilter.nf_conntrack_generic_timeout – fallback for unknown protocols.

On firewalls, similar timers exist under timeout commands (ASA, PAN‑OS, SRX).

CLI Examples for Configuring Timeout Values

Linux (sysctl)

# View current TCP established timeout (seconds)
sysctl net.netfilter.nf_conntrack_tcp_timeout_established
# Set to 2 hours (7200s) – persists until reboot
sysctl -w net.netfilter.nf_conntrack_tcp_timeout_established=7200
# Make permanent
echo "net.netfilter.nf_conntrack_tcp_timeout_established=7200" >> /etc/sysctl.d/99-conntrack.conf
sysctl -p /etc/sysctl.d/99-conntrack.conf

Linux (nftables)

# nftables can expose conntrack limits via the 'ct' timeout table (kernel 5.6+)
nft add table ip filter
nft add chain ip filter input { type filter hook input priority 0 \; }
# Example: set TCP established timeout to 7200s for packets matching port 22
nft add rule ip filter input tcp dport 22 ct timeout set 7200

Cisco ASA

# Show current timeout
show running-config all | include timeout
# Change TCP timeout to 2 hours
timeout conn 7200

Palo Alto (PAN‑OS)

# Configure via CLI
configure
set deviceconfig setting session tcp-timeout 7200
commit

Juniper SRX

# Set TCP session timeout to 2 hours
set security flow tcp-session timeout 7200
commit