PMTUD - importance and considerations (2024)

Introduction

There are some cases when our customers are reporting that the TCP connections from On-premises to OCI just hangs apparently without any good reason. In the next discussion we will introduce an important detail that usually is not taken into consideration when the connection hanging troubleshooting is made.

The case discussed is when the TCP three way handshake completes successfully but when starting sending the data the connection is hanging. Why? In modern TCP and UDP stack implementation the "Don't Fragment" bit is set in the IP header when encapsulating a TCP segment or a UDP datagram. This is used to implement PMTUD to automatically discover the lowest MTU on the path and to avoid IP packet fragmentation between sending and receiving hosts.

PMTUD stands for Path MTU Discovery, an automatic mechanism to discover the lowest MTU between two endpoints. PMTUD relies on ICMP Type 3 Code 4 messages received from the upstream routers announcing that a packet exceeding the MTU value, needs to be sent out but in a non-fragmented way (due to theDon't Fragment" bit set). The router is dropping the packet announcing its MTU that needs to be used in order to avoid the IP packet fragmentation to the sender host. The sender host will store the value in the routing entry associated with the destination host for a period of time and use it to avoid fragmentation that can impact the performance.

More details about PMTUD can be found accessing the following link: https://www.ietf.org/rfc/rfc1191.txt

Next is a list of ICMP Types and Codes: https://www.iana.org/assignments/icmp-parameters/icmp-parameters.xhtml, we will focus on ICMP Type 3 (Destination Unreachable) and Code 4 (Fragmentation Needed and Don't Fragment was Set).

Note: On the OCI side the PMTUD is automatically allowed if statefull security rules are used (means the ICMP Type 3 Code 4 is automatically allowed, you do not need to have security rules allowing the ICMP Type 3 Code 4). If stateless security rules are used the ICMP Type 3 Code 4 needs to be allowed to perform PMTUD. More details can be found by accessing the OCI public documentation: https://docs.cloud.oracle.com/iaas/Content/Network/Troubleshoot/connectionhang.htm

Topology Architecture Diagram

PMTUD - importance and considerations (1)

The traffic will flow from On-premises to OCI from the two subnets: Subnet1 with PMTUD active (ICMP Type 3 Code 4 is allowed) and from Subnet2 with PMTUD inactive (ICMP Type 3 Code 4 blocked - cannot reach 172.30.1.2). The On-premise Gateway (a Linux machine with LibreSwan for IPSec) has MTU of 1500 bytes. The tunnel MTU on the OCI side is 1420 bytes and all hosts MTU are set to 9000 bytes. The hosts MTU is intentionally set to 9000 to send data bigger than 1500 bytes and to trigger the PMTUD mechanism.

Test1: TCP iperf3 test the traffic from Subnet1 to 192.168.12.242.

1. Starting the iperf3 tcp traffic:

PMTUD - importance and considerations (2)


2. The LibreSwan VM is sending the ICMP Type 3 Code 4 announcing that the IP packet is too big and needs to be fragmented but the DF bit is set and also includes the MTU that should be used back to the originating host:

PMTUD - importance and considerations (3)


3. The sender VM is updating the route table with the MTU for this particular destination and will use it for about 600 seconds (Linux) - the connection is working fine; the same is happening with ssh over the IPSec tunnel - the packet size sent out by the sending host will not exceed the desired MTU:

PMTUD - importance and considerations (4)

Test2: TCP ssh test the traffic from Subnet2 to 192.168.12.242.

1. We are trying to connect via ssh over the IPSec tunnel to 192.168.12.242 (the connection hangs):

PMTUD - importance and considerations (5)
2. The next-hop (LibreSwan VM) is sending the ICMP Type 3 Code 4 back to 172.30.1.2:

PMTUD - importance and considerations (6)


3. The ICMP Type 3 Code 4 is filtered in this subnet. The tcpdump on the sending host confirms that the ICMP Type 3 Code 4 is not received, so it will not be able to set the correct MTU value. That being said, the sender host will not have any indication of the correct MTU size and in conjunction with the DF bit set in the IP header, the next router will drop the packet and the connection just hangs:

PMTUD - importance and considerations (7)

One solution is to manually set the correct MTU value on each and every host but this can be a very time consuming job if needs to be set on hundreds of hosts. Allowing ICMP Type 3 Code 4 on the On-premises firewalls to reach the sending hosts we can let PMTUD do the MTU signalling. Or, just configure the firewall or the router where the IPSec is configured with the TCP MSS adjustment type of commands.

PMTUD - importance and considerations (8)

Andrei Stoian

Master Principal Cloud Architect | North America Cloud Engineering
PMTUD - importance and considerations (2024)
Top Articles
Latest Posts
Article information

Author: Jerrold Considine

Last Updated:

Views: 6560

Rating: 4.8 / 5 (58 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Jerrold Considine

Birthday: 1993-11-03

Address: Suite 447 3463 Marybelle Circles, New Marlin, AL 20765

Phone: +5816749283868

Job: Sales Executive

Hobby: Air sports, Sand art, Electronics, LARPing, Baseball, Book restoration, Puzzles

Introduction: My name is Jerrold Considine, I am a combative, cheerful, encouraging, happy, enthusiastic, funny, kind person who loves writing and wants to share my knowledge and understanding with you.