- February 21, 2021
- 3 minute read
Cloud based online services and applications face the challenge of optimizing end user experience in the face of an ever growing bandwidth demand. This is further complicated by network related aspects of application performance like latency, congestion and jitter which are inherent to the architecture of the internet.
These issues can have a very negative impact on end user experience and can vary widely depending on the customer's geographical location or the ISP/network that they use. End users on the US East Coast may have a great experience, while those in Europe see long latencies and very slow performance of your Web shop or application. Similarly, end users at a specific ISP may experience a very responsive service, whereas their neighbor down the street is confronted with a very sluggish web site with long page loading times due to slow delivery of ads in the pages.
In this article we discuss AWS network performance management problems and solutions.
1. Business problem
Network performance is essentially a black box for online service providers where they have little to no visibility into performance metrics like latency and congestion. As of today (Net) DevOps personnel have to manually diagnose network performance issues and redirect network traffic to avoid these problems. This is not an exact science and is mostly reactive in nature. Also putting in place the hardware capabilities to optimize the network related aspects of cloud application performance is costly and complex. The absence of any cost effective and automated tools to improve these performance metrics, in any meaningful way, adds to the problems.
The business impact however is very real:
Network providers have a vested interest in BGP route selection. Not all routes through the internet cost the same. BGP route selections are often influenced by business interests of network providers and their wish to control next hop selection.
ISPs often choose to route traffic though network paths that have the most financial benefit for them and not based on network performance metrics. There have been documented cases of large ISPs intentionally creating congestion in some network nodes to charge service provider’s premium rates for non-congested paths. Basically what they do is create a lesser version of the internet to be able to charge bigger bucks for the internet as usual.
2. Technical problem
The internet is a huge mesh of complex interconnected networks. It utilizes two groups of routing protocols to determine the path of traffic through the various networks. Interior Gateway Protocols (IGPs) for intradomain routing, and BGP for interdomain routing between Autonomous System (AS) organizations. One way to understand network performance issues is to look at the way in which internet traffic
is routed by the BGP.
BGP serves as the standardized routing protocol of the internet. It was designed in the early days of the internet with a focus on network reachability and stability, however it is not very smart when it comes to routing traffic to optimize performance related metrics like latency, congestion and packet loss. In addition, it has become very hard to analyze, manage and troubleshoot with the explosive growth of the
BGP works by exchanging routing and reachability information between autonomous systems on the internet. BGP makes routing decisions based on a number of metrics including reachability and AS_PATH attribute. This basically translates into choosing network routes which are reachable and have the lowest number of AS hops. BGP does not have the capacity to evaluate different network routes based on their
performance metrics like latency, packet loss, congestion and packet loss. Therefore, these crucial performance related metrics are left out when making routing decisions. As a result, network traffic often suffers from high latency, congestion and packet loss.
GlobalDots probes upwards of 600,000 network prefixes in real time and collects performance data of every path through the network. This data is then processed and analyzed to determine the best path through the network with the lowest latency, congestion, bandwidth cost and packet loss. (Net) DevOps have the ability to create rules to automate network traffic routing and in the process minimize network
The Cloud/AWS deployment places the GlobalDots appliance between the virtual infrastructure and the transit providers. The connection to the virtual infrastructure is a physical connect (i.e. AWS DirectConnect 1Gbase-LX or 10Gbase-LR). Each customer is connected to the GlobalDots premises using a unique VLAN identifier.
Within the GlobalDots premises, there exists a virtual routing table for each application that reflects the specific performance requirements of the customers. GlobalDots collects RIB (Routing Information Base) data from routeviews.org. RIB represents a special type of database which stores routing information received by every BGP speaker from other peers.
Next GlobalDots probes all prefixes in the RIB for specific metrics like latency, congestion, packet loss and bandwidth cost. All this performance data is processed and analyzed by a spark cluster and an optimized routing policy is generated for specific metrics based on customer requirements that have been indicated. The optimized routing policy is generated by matching the performance attributes of each destination network (prefix) in the network to the customers’ requirements.
Once the prefixes match the performance requirements of the customers they can opt to override the best path selection of the Border Gateway Protocol. Detailed analytic reports are generated and communicated to the customer through the front end dashboard and provide end to end visibility into network performance.
Managing AWS network performance has specific problems you need to solve if you want to get the most out of it. If you have any questions about how we can help you optimize your cloud costs and performance, contact us today to help you out with your performance and security needs.