Load Balancing Primer

12 Nov 2021

Goal

To review Load Balancing basics and understand how it helps modern web and data clusters.

Discussion

We have been playing with our helloworld toys for some time now. Time to take a break and do a theory session. Why are we doing this? So far, we have built a single-server microservice and figured out mechanisms for effective deployability and some scalability/reliability using orchestration. Before we expand our service features, we need to understand a few key backend technologies, which serve as the entry point to a cluster made of our service. Here are a few keywords that we have heard of/used multiple times: load balancing, API gateway, reverse proxy, ingress, service mesh. Are these the same? How are these different? Let’s attempt to demystify a bit. Let’s start with Load Balancing. This post is basically notes from my exploration on fundamentals of load balancing. HAProxy documentation provides a ton of insights and is an excellent read for anyone interested in a survey of the technology. I will be using the abbreviation LB to make it easier to type.

What is it?

It is a “traffic cop” that helps to effectively distribute traffic across multiple components in a resource group (cluster / farm / pool).

Where is it used?

In front of a web server cluster
Between a web server and an app server or a cache cluster
Between an app server and a database cluster

Benefits

Higher availability of resources, coupled with lower waiting time for access to resources
Higher throughput of service from these resources
Lesser load / stress on individual resources, allowing them to perform in the optimal range
Effective monitoring of resources in the cluster, identification of bottlenecks etc.

Types of LB

Software vs Hardware

Hardware LB is a dedicated machine with specialized processors tuned to increase throughput
Software LB is a software that can be deployed on commodity hardware to provide LB functionality. These are cheaper, flexible and can scale with commodity hardware.
Hardware LB is much more expensive compared to software, hard to scale quickly due to cost and configuration complexity, and needs specialized expertise to configure and maintain.
Software LB provides the option of running in cloud. Cloud LB as a Service (LBaaS) are available from vendors like Cloudflare.
Virtual LB is also possible where software/firmware of a “physical” LB is run on virtual machines.
DNS LB is a form of Software LB where a DNS server itself provides LB.
Cross data center LBs a.k.a multi-site LBs are also available.

Layer 4 vs Layer 7

Assuming familiarity with OSI model here. Quick recall: Layer 4 is the Transport layer, dealing with protocols like TCP/UDP/QUIC whereas Layer 7 is the Application layer, dealing with protocols like HTTP etc.

An LB can operate at either of these levels. The following are the differences:

A Layer 4 LB only operates at the transport level. So, it doesn’t look at the content of what is being transported. As a result, it cannot provide any “smart” features, but is typically fast and light weight.
- Essentially, it checks the IP and port and forwards packets with just a Network Address Translation (NAT).
A Layer 7 LB on the other hand operates at the Application layer. Hence, it has access to the content: headers, URL, content type, message, cookies etc.
- They typically terminate network traffic, examine the content and decide on where to send it based on a routing algorithm.
- They can perform many complex operations like decryption, parsing, cookie matching, routing and even changing content in some scenarios.
- Due to the added complexity, they typically need more computing resources.
- One of the key reasons Layer 7 LBs are used is “Sticky Sessions”. Since a core functionality of an LB is distributing traffic, it typically routes every request in a stateless manner to any available server based on the routing algorithm. On an application that requires maintaining a session state for a client, this becomes a problem, if the session state is only saved on the application server. This is typically solved by routing all the requests of a session to always go the same server in the cluster, known as a sticky session.
- Layer 7 LB can do TLS/SSL termination / offloading. This simplifies internal network communication. Essentially, certificates are only managed at the LB level and all internal traffic proceed unencrypted. This avoids the expense and complexity of deploying certs on internal resources. As long as the LB and the resources are in the same data center, sharing a security ring-fence, this shouldn’t be an issue.
- They can also offload cacheable content. This comes in handy if there is an attempt to DDOS, since the LB can effectively get stale content served from some external CDN, reducing the impact of the DDOS on the actual cluster resources.
- Due to these features, a Layer 7 LB often identifies / amplifies a lot of inherent issues in the cluster. So, logging is extremely important to take advantage of this ability.
Sometimes, a combination of Layer 4 and Layer 7 LBs can be used, with an edge router sending traffic to a layer of L4s which in turn route to another larger layer of L7s, each of which have their cluster of resources to distribute to.

Load Balancing Algorithms

One of the key configurable feature of an LB is the routing algorithm. This is the mechanism by which an LB decides which server in the cluster should receive the current request. Many options exist and many LB software may implements all or some of these. Here is a non-exhaustive list:

Round Robin:
Each server is picked in a round-robin fashion. A naive approach will assume equal spec. servers, but in many practical scenarios, server nodes in the cluster are unequally spec-ed. So, a weight will be associated with each node (usually based on processing capacity). These weights can be adjusted on the fly too (to support slow-start when a server comes up). Smoothest and fairest when the connections involved similar workloads. Usually a good option when there aren’t many persistent connections (since a persistent connection might upset the fairness of the algorithm).
Least Connections:
The server with the lowest number of connections will receive the next request. If many candidates exist, they will be managed in a round-robin fashion. Node weighting is also supported. So, this has the opposite property to round robin. It is a fair algorithm when long persistent connections are expected. This algorithm could be augmented with other factors like least latency / response time.
Resource based: Resource based LB requires an agent to be installed in every server that reports on its current load and available resources to the LB. LB then takes routing decision armed with this information. This can be made more sophisticated by introducing agents that can report on the health of the network infrastructure, amount of congestion on the networt etc.
First available:
Servers are numbered with an id and the first available server based on ascending order of ids will receive the next request. Naturally, a maxiumum connection threshold will need to be configured. This option obviously doesn’t play fair, but it is meant to serve a different purpose: to always use the smallest number of servers needed, so that others can be powered off. Of course, it also means that when the load increases, more servers need to be powered up - something a Kubernetes like controller could help achieve.
Source IP:
The source IP is hashed and divided by the total weight of the servers, to determine the server that will receive the next request. This ensures that a client IP is always mapped to the same server as long as the cluster topology remains the same. This could provide ‘best-effort’ stickiness for clients who refuse to track cookies.
URI:
The URI of the request is hashed and divided by the total weight of the servers, to determine the server that will receive the next request. This provides stickiness to a URI instead of the source IP. So, it helps get good cache hit rates for proxy caches.
Random:
A random consistent hashing is applied, respecting the weights of the servers, to determine the server that will receive the next request.

Monitoring Resources

LBs perform health checks on the resources in the cluster. These checks must be representative of the issues they are meant to catch.
For example, a ping check won’t detect a server crash. To detect that, we require a connection to the specific port. A connection check won’t detect if the underlying database is accessible by the server.
Health checks need to strike the right balance of how frequently they hit the resources. After all, we don’t want the health checks to be behave like a DDOS!
Besides health checks, another way to monitor a cluster would be to sample some production traffic to see of we get the intended response.
A combination of health checks and sampling can be employed to detect faults, with further health checks to detect the end of said fault.
Yet another way to monitor is to employ a centralized agent, which executes the monitoring activities and periodically informs the LB. This is more effective on a multi-LB topology. This approach has the additional benefits of centralized reporting, actionable insights etc., but may be lesser than ideal in terms of responsiveness or accuracy.

Summary

This post is a quick introduction to the concept of Load Balancing. In the next post, we will explore a more specialized / sophisticated tool which subsumes the functionality of load balancing - an API gateway.

System Design Labs