Pilot VPC and Advanced NAT: Securely Connect Overlapping Networks to AWS VPC
In today's dynamic business environment, cloud computing has become a crucial enabler, offering enterprises unmatched scalability, flexibility, and cost-efficiency. Amazon Web Services (AWS), a leading cloud service provider, has transformed how organizations manage their IT infrastructures and applications. With AWS Virtual Private Clouds (VPCs), businesses can establish secure, isolated environments within the cloud, replicating the capabilities of traditional on-premises networks. However, despite the clear benefits of cloud adoption, bridging the gap between on-premises networks and AWS VPCs can be challenging, particularly when dealing with overlapping IP addresses. Situations often arise where on-premises networks and AWS cloud environments unintentionally use the same private IP addresses, obstructing communication and data exchange across the VPN tunnel.
This article addresses this specific issue and explores an innovative solution for establishing secure connectivity between overlapping on-premises networks and AWS VPCs. Leveraging AWS site-to-site VPN, the traditional method for connecting on-premises environments and VPCs, enterprises frequently face obstacles when managing conflicting IP addresses. Mergers, acquisitions, and other networking complexities further complicate the situation, making straightforward resolution difficult.
Pilot VPC and Advanced Twice NAT Technology
To address these challenges, this article introduces the concept of Pilot VPC and Advanced Twice NAT technology. By strategically implementing virtual routers and Clover network address translation, enterprises can effectively navigate overlapping IP address issues, ensuring seamless communication between on-premises networks and AWS VPCs.
The use of these advanced technologies not only resolves the overlapping IP address problem but also enhances network security, data privacy, and overall operational efficiency. In the following sections, we will explore the technical details of Pilot VPC and Advanced Twice NAT technology, highlighting their role in creating a robust and secure network infrastructure. By providing practical insights and real-world examples, this article aims to equip businesses with the knowledge and tools needed to overcome the complexities of integrating on-premises networks and AWS VPCs, ultimately fostering a seamless, hybrid cloud environment.
VPN Overlapping Network Issues
In the above diagram, we can see two distinct sites: the on-premise network and the AWS VPC (Virtual Private Cloud), interconnected via a VPN (Virtual Private Network) tunnel. This connection is established through the customer gateway and the AWS VPN Gateway. Notably, both the on-premise network and the AWS VPC employ the IP address range 10.0.1.0/24 for their respective internal networks.
Let us consider a scenario where communication is required between an on-premise Host (with IP address 10.0.1.7) and an EC2 instance (with IP address 10.0.1.8) within the AWS VPC. In this situation, the on-premise Host operates under the assumption that all IP addresses within the 10.0.1.0/24 range belong to its local network. Consequently, when the on-premise Host attempts to transmit a packet destined for 10.0.1.8, it will not direct the packet towards the customer gateway device.
Likewise, a similar predicament arises for an EC2 instance within the AWS VPC. Assuming the EC2 instance is configured with the IP address 10.0.1.8, it also perceives the range 10.0.1.0/24 as part of its local network within the AWS VPC. Consequently, when the EC2 instance attempts to dispatch a packet to the IP address 10.0.1.7, it will route the packet within its local network instead of directing it to the VPN Gateway. This situation presents a challenge since the packets are not being routed to the VPN Gateways. Consequently, the VPN gateways are unable to forward these packets through the VPN tunnel to the respective opposite side. As a result, due to the overlapping networks, neither side is able to establish a connection with the other, leading to a communication failure between the on-premise Host and the EC2 instance within the AWS VPC.
The Proposed Solution for VPN Overlapping Networks
A transit gateway (TGW) is a central cloud router that enables you to connect your virtual private clouds (VPCs) and on-premises networks using a hub-and-spoke model. It combines attachment domains and routing domains to give you different options for routing, similar to VRF-style routing. The solution to the problem is to make each host believe that the other host is on a separate network. This way, when a packet needs to be sent over the VPN tunnel, it goes to the router first.
Configure Your Network Using a Transit Gateway
There are two ways to configure your network using a transit gateway to facilitate solutions:
- The first solution involves source and destination NAT configuration on appliance/Linux instances in Pilot VPCs. (Use this solution if a single VPC is overlapping with the on-premises network.)
- The second solution involves a NAT configuration on both on-premise customer gateway devices. (Use this solution if multiple VPCs are overlapping with the on-premises network.)
Solution #1: Source and Destination NAT in Pilot VPCs
The solution requires two pilot VPCs where we launch two virtual appliances (or Linux instances) in each VPC in two different availability zones (AZs) to provide high availability. The two VPCs will have the exact same configuration, including IP CIDR. The idea is to use two different TGW attachments and two TGW route tables to provide separate ingress and egress paths between the on-premise network and the actual customer VPC. We'll use the appliance's traffic control policy action to do SNAT and DNAT. This will be a stateless NAT configuration on these appliances. Traffic Control NAT action allows us to perform NAT without the overhead of conntrack, thus giving us the option to NAT large numbers of flow and addresses.
We configure a NATing policy on appliances in a way that the ingress filter translates destination addresses; i.e., perform DNAT. The egress filter translates source addresses; i.e., perform SNAT. Filters configured on the appliance allow for efficient lookups of a large number of stateless NAT.
Forward Traffic Flow
On-premise 10.0.0.0/16 → tgw-rtb-vpn → tgw-attach-Pilot#1 → NAT instance in Pilot VPC#1 [NAT translation SNAT and DNAT] src 192.168.0.0/16 dst 10.0.0.0/16 → tgw-rtb-Pilot#1 → tgw-att-Dest VPC → VPC Blue 10.0.0.0/16
Similarly, Return Traffic Flow
Dest VPC 10.0.0.0/16 → tgw-rtb-dest vpc → tgw-attach-pilot#2 → NAT instance in pilot VPC#2 [NAT translation SNAT and DNAT] src 192.168.0.0/16 dst 10.0.0.0/16 → tgw-rtb-pilot vpc#2 → tgw-att-vpn → On-premise 10.0.0.0/16
The bi-directional traffic flow can be seen in the following figure:
Solution #2: Policy NAT on Both Sides
This solution involves using Policy NAT on both on-premise customer gateway and virtual routers in AWS VPC to map their internal network when connecting to the remote network. On-premise customer gateway’s Policy NAT configuration will match packets with a Source IP of 10.0.1.0/24 (On-premise’s actual network) and a Destination IP of 20.0.1.0/24 (VPC’s masked network), and translate the Source IP to the 40.0.1.0/24 network (On-premise network). Virtual Router’s Policy NAT configuration will match packets with a Source IP of 10.0.1.0/24 (VPC’s actual network) and a Destination IP of 40.0.1.0/24 (on-premises masked network), and translate the Source IP to the 20.2.2.0/24 network (VPC’s masked network).
In this way, on-premise CGW is masking the on-premise 10.0.1.0/24 network as 40.0.1.0/24, and the Virtual router is masking the VPC 10.0.0.0/24 network as 20.0.1.0/24.
On-premise 10.0.0.0/16 → tgw-rtb-vpn → appliance-vpc-attach →gwlb endpoint/appliance [NAT translation SNAT and DNAT] src 192.168.0.0/16 dst 10.0.0.0/16 → tgw-rtb-Pilot#1 → tgw-att-Dest VPC → VPC Blue 10.0.0.0/16
Similarly, return traffic flow:
Dest VPC 10.0.0.0/16 → tgw-rtb-dest vpc → tgw-attach-pilot#2 → NAT instance in pilot VPC#2 [NAT translation SNAT and DNAT] src 192.168.0.0/16 dst 10.0.0.0/16 → tgw-rtb-pilot vpc#2 → tgw-att-vpn → On-premise 10.0.0.0/16
When the on-premise host connects to the EC2 instance in VPC, the source IP will be 10.0.1.7 and the destination IP will be 20.0.1.8. Now, the on-premise host will send this packet to on-premise CGW, who will change the Source IP to 40.0.1.7 in accordance with its Policy NAT configuration. The same will occur when the EC2 instance connects to an on-premise host: the source IP will be 10.0.1.8 and a destination of 40.0.1.7. This packet will be sent to the Virtual router, which will translate the source IP to 20.0.1.8. Inside the VPN tunnel, the packets will appear as if the 40.0.1.0/24 network is speaking to the 20.0.1.0/24 network.
When each host's packet arrives at the router on the other side, the router translates the destination address to the corresponding IP address. The virtual router will un-translate the destination IP of the received packet from 20.0.1.8 to 10.0.1.8 and send the packet to the EC2 instance. Similarly, the On-premise customer gateway will un-translate the destination IP of the received packet from 40.0.1.7 to 10.0.1.7 and send the packet to the on-premise host. This will enable hosts in on-premise and EC2 in AWS VPN to connect to each other, despite both sites having IP addresses in overlapping networks.
The bi-directional traffic flow can be seen in the following figure:
- Important note: In order to successfully implement this solution, please configure the TGW and VPC route table accordingly.
Conclusion
In conclusion, by employing Pilot VPC and Advanced Twice NAT technology over site-to-site VPN, organizations can overcome overlapping IP address challenges, ensuring secure connectivity and seamless communication between on-premises networks and AWS VPCs. This approach empowers enterprises to optimize their hybrid cloud infrastructure, guaranteeing data privacy and network security while embracing the transformative potential of cloud computing.