AWS NAT Gateway and High-Availability NAT Instances with Auto-Scaling
In this blog, we will cover the basics of AWS Virtual Private Cloud (VPC), NAT Gateway, NAT Instances and explain the working of a High Availability version of NAT instance deployment.
What is Virtual Private Cloud or VPC?
VPC is a virtual network on AWS that is similar to an on premise network and provides the same level of control, security and usability but abstracts the complexities of setting up an on premise network.
Network configuration can be setup as per our requirements and we can define IP address spaces, routing tables and subnets, hence have control over the configuration os our servers to be internet facing or stay isolated and secure within our VPC. Thus controlling all the Ingress (Incoming traffic) and Egress (Outgoing traffic) completely. All the other AWS resources such as EC2 instances, Databases, Storage Buckets are deployed within VPC’s to secure them and control their interaction with the internet and between our own deployed services.
What are the features of AWS VPC?
The features and structural components of AWS VPC are:
Subnets: These are used to segregate the VPC and span the VPC into multiple Availability zones.
Routing Tables: It is used to manage and control the Egress traffic.
Internet Gateway (IGW): Entry point to the internet from within the VPC.
Availability Zone Management: Manage and create multiple Availability Zones from the VPC.
NAT Gateway: It is used to enable the resources within a private subnet to get access to the internet.
Network Access Control Lists (NACL): It is a stateless component that controls and manages access to each subnet within the VPC.
What is NAT Gateway and how does it work?
NAT gateway is used to enable instances within a private network to connect to the internet. It is used in order to secure the instance and prevent the internet from initiating a connection with them. Hence it allows only Egress traffic and blocks all Ingress traffic.
Image Source: AWS Docs
Considering a scenario in which we have a VPC. It contains a Public Subnet and a Private Subnet. The Public Subnet as its name suggests has access to the internet via that Internet Gateway(IGW) and contains the application and servers that need to be internet facing such as Web Servers, Web Application and public facing API Servers etc. Whereas Private Subnet contains internal services EC2 instances and other resources that need to be secured and are used internally in coherence with other resources such as Databases, Datapipeline servers etc. Each subnet contains a Route Table that hold the Destination-Target mapping between these subnets.
NAT Gateway is set up in an EC2 instance inside Public Subnet. In order to access this NAT Gateway of the Public Subnet, the Route Table of the Private Subnet that contains the local route is updated and a route is added that points to the NAT Gateway(0.0.0.0/0 -> nat-gateway-id).
The NAT gateway has an Elastic IP Address that is assigned to the EC2 instance on which it is setup. The Public Subnet already has access to the Internet Gateway(IGW), hence this NAT Gateway is also connected to the IGW by adding a route to IGW(0.0.0.0/0 -> igw-id). Any request for internet access the originates from a resource or EC2 instance that lies inside the Private Subnet is routed to the NAT Gateway inside the Public Subnet that is completely secure and in turn the NAT Gateway makes that requests to the internet via the IGW Gateway thus creating a secure layer of abstraction between our private resources and the internet.
Note: If we have multiple subnets in different availability zones then AWS will not automatically set up NAT gateway in all Availability Zones and we would need to set up multiple NAT Gateways for different Availability Zones.
What are NAT Instances?
Apart from using the AWS NAT Gateway, we can create our own NAT AMI and run it on an EC2 instance on a Public Subnet in the VPC and this can be used to enable the Private Subnet to initiate Egress traffic to the internet, while keeping it secure from any Ingress traffic from the internet.
Image Source: AWS Documentation
The working of Egress request from Private Subnet to the internet is very similar to the one used in NAT Gateway.
Private Subnet (EC2 Instance) -> Public Subnet (NAT Instance) -> Internet Gateway(IGW)
We can create our own AMI by customizing an existing amazon AMI to run as an NAT instance.
Reference Link: Creating Amazon EBS-backed AMI’s
Reference Link: Create NAT Instance
Note: These approaches of creating NAT Instances are useful and cost-effective as compared to using a dedicated NAT Gateway however this approach is not nearly as scalable, resilient or fault-tolerant as an NAT Gateway as mainly scripts are used to manage failover between instances. The maintenance and workload in this approach is higher as compared to an NAT Gateway but it is very cost effective and worth the use in multiple cases.
Reference Link: Comparison reference between NAT Instances and NAT Gateway
How to deal with the scalability and availability issue on NAT Instances?
The is the situation in which GlobalDots and Terraform community comes to the rescue. Here at GlobalDots, we created a module that provisions High Availability NAT instances by launching autoscaling groups with NAT instances in the specific Public Subnets to allow outbound internet traffic i.e Egress from the Private Subnets. Each instance in this runs AWSnycast for route publishing.
Working of this module?
The module uses the approach of removing an NAT Instance from the route table if it becomes unavailable. When one NAT instance has been terminated then ASG spins up a new one attaching a proper ENI to it.
Usage of this Module?
The Input, Output and Usage of this module is explained properly in the GitHub repository.
Repository Link: Globaldots/terrafrom-aws-nat-instances-ha
Licensing of this module?
This module is licensed used Apache 2 license and is based on tf_aws_nat from the Terraform community.