Engineering Practices to Reduce VPN Latency: From Protocol Selection to Kernel Tuning
1. Protocol Selection: WireGuard vs OpenVPN
VPN protocol choice significantly impacts latency. WireGuard, based on UDP with modern cryptographic primitives (ChaCha20, Curve25519), runs in kernel space, eliminating context-switch overhead. Benchmarks show WireGuard reduces latency by 30%-50% compared to OpenVPN (TLS mode) under identical network conditions. While OpenVPN offers flexibility, its user-space processing, TLS handshake, and cipher negotiation introduce extra delay. For latency-sensitive applications, WireGuard is the preferred choice.
2. Transport Optimization: TCP BBR and MTU Tuning
2.1 Congestion Control Algorithm
The default CUBIC algorithm performs poorly on high-latency links. BBR (Bottleneck Bandwidth and Round-trip propagation time) models the network path's bandwidth and RTT, avoiding bufferbloat and significantly reducing queuing delay. Enable BBR on the VPN interface:
# Set BBR on VPN interface
echo "net.core.default_qdisc=fq" >> /etc/sysctl.conf
echo "net.ipv4.tcp_congestion_control=bbr" >> /etc/sysctl.conf
sysctl -p
2.2 MTU and MSS Clamping
VPN encapsulation adds overhead (WireGuard ~60 bytes, OpenVPN ~80 bytes). If the physical link MTU is 1500, set the VPN tunnel MTU to 1420-1440. Oversized MTU causes IP fragmentation, increasing retransmissions and latency. Adjust with ip link set dev wg0 mtu 1420. Additionally, clamp MSS in the firewall:
iptables -t mangle -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu
3. Kernel Tuning: RPS, XPS, and Interrupt Affinity
3.1 Receive Packet Steering (RPS)
RPS distributes softirq processing across multiple CPU cores, avoiding single-core bottlenecks. For multi-queue NICs, set RPS CPU mask per queue:
echo "f" > /sys/class/net/eth0/queues/rx-0/rps_cpus # Use first 4 cores
3.2 Transmit Packet Steering (XPS)
XPS binds transmit queues to specific CPUs, improving cache locality. Configure similarly to RPS:
echo "f" > /sys/class/net/eth0/queues/tx-0/xps_cpus
3.3 Interrupt Affinity
Bind NIC interrupts to dedicated CPUs to reduce interference with other tasks:
# Check interrupt numbers
cat /proc/interrupts | grep eth0
# Bind interrupt to CPU0
echo "1" > /proc/irq/<IRQ_NUM>/smp_affinity
4. Additional Engineering Practices
- Enable GRO/GSO: Reduce small-packet processing overhead:
ethtool -K eth0 gro on gso on. - Use UDP Tunnels: Avoid TCP-over-TCP performance collapse. WireGuard uses UDP natively; OpenVPN can be configured in UDP mode.
- Hardware Acceleration: CPUs with AES-NI accelerate encryption; WireGuard's ChaCha20 benefits from AVX instructions.
5. Performance Validation
Use iperf3 and ping to measure latency before and after optimization:
# Test latency
ping -c 100 -i 0.1 <VPN_GATEWAY>
# Test throughput
iperf3 -c <VPN_GATEWAY> -t 30 -P 4
Compare RTT and throughput to confirm tuning effectiveness.