Securing our Communications - TLS

16 Nov 2021

Goal

To secure communications to our service using TLS (real certificates from Let’s Encrypt) and terminate it at our API gateway.

Discussion

So far, we have dealt with plain old HTTP for our helloworld service. I have referred to TLS termination in my previous posts on load balancing and api gateway. Now, it is time to enable HTTPS for our web service as we have the necessary sophistication via the Ambassador gateway.

Quick recall: TLS stands for Transport Layer Security. It provides encryption, authentication and data integrity on communication between ‘clients’ and ‘servers’ (client and server can be a wide variety of things here). It evolved from SSL (Secure Socket Layer). HTTPS is the TLS implementation over HTTP. One of the fundamental building blocks of TLS is the digital certificate. This certificate is issued by a Certificate Authority (CA) to the entity which owns the server domain.

In our case, we have a few interesting nuances before we TLS away to glory. Let’s dig deeper! First up, we are not actually on the web. Mine is just a private network and I don’t intend to expose it to the internet just yet (or ever :D). So, there is no question of actual internet traffic coming through to our kuberbabies, er, nodes. So, there are simpler ways of going about this with a self-signed cert, local DNS server, local fake CA and such. But, I wanted to get closer to the actual TLS experience. So, we will do a little more.

We will use a real CA without spending real money, just like everyone else is doing these days. Enter Let’s Encrypt. Let’s Encrypt offers free certificates for our domain, with the catch that it will expire frequently. So, with a the help of some additional tooling, we can achieve our goal.

Let’s Encrypt uses what is known as ACME protocol and can validate that I own my domain in a couple of ways. It can validate that I own the domain name by asking a DNS server (DNS01 challenge) or it can validate that I own the host by attempting to access a file from my actual ‘server’ (HTTP01 challenge). Obviously, I cannot do HTTP01 with my private server. So, I went with the DNS approach. We need to demonstrate control over a real domain name though. An IP address just doesn’t cut it. So, yes, we have to pony up a little money.

The second need is to manage the issued certificate, since it will expire often. In a Kubernetes environment, this is often achieved by a tool called cert-manager. Cert-manager supports a bunch of popular DNS providers that can work well for this need. So, I registered some domain name for this purpose with AWS Route53 which is one such provider, and then created an A record for my private IP address. This itself is more than what I would like to do (and absolutely not best or secure practice by any means). But, science!

Once this part is wired up, we then need to make changes to Ambassador Edge Stack to consume this certificate and begin supporting and terminating TLS for this host. Let’s get our hands dirty!

Infra

3 VM instances running on top of Hyper-V, provisioned with Ubuntu 20.04 Live Server. One will function as the Kubernetes Control Plane, with Ambassador, while the other two will function as the Kubernetes worker nodes.

Stack

The microservice hasn’t changed. So, it is the same stack as the helloworld post.

Containerization and Orchestration

Docker and Kubernetes

TLS Termination, Load Balancing

Ambassador Edge Stack, Let’s Encrypt, cert-manager, AWS Route53 (or equivalent) for DNS01 challenge, MetalLB

Architecture

Helloworld with TLS Architecture Diagram

Setup

Common note for all setup: to keep things simple and uniform, we will be adding cert-manager as well as all Ambassador configurations like Listener, Mapping etc. under the ambassador namespace. This simplifies setup and avoids some gnarly dependencies which will arise otherwise.

We will use Helm to install cert-manager first.

helm repo add jetstack https://charts.jetstack.io
helm repo update

helm install \
  cert-manager jetstack/cert-manager \
  --namespace ambassador \
  --create-namespace \
  --version v1.6.1 \
  --set installCRDs=true

Now, we are ready to setup the DNS provider. In my case it was AWS Route53. To avoid distracting content, I am going to provide reference to the cert-manager documentation that explains this step: Route53 configuration.

I did the following:

Created the IAM policy as outlined in the doc
Created an IAM user, attaching the policy from #1 and grabbed the user’s access ID and secret access key

Once done, we will create a Kubernetes secret with the secret access key we got from Route53. Note that I am creating this secret under the same namespace as the cert-manager. All steps below will follow the same rule.

kubectl create secret generic route53-secret --from-literal=secret-access-key="your-IAM-user-secret-key" -n ambassador

That’s covers Route53. Now, it is time to configure cert-manager. From hereon, We will create a bunch of YAML files and use ‘kubectl apply -f filename’ to execute them. Please ensure to replace any placeholder names I may have used below.

First we will create a ClusterIssuer and a Certificate. These are concepts defined by cert-manager that will help in its work. A ClusterIssuer refers to a CA, in this case, Let’s Encrypt. A certificate refers to the resource that maintains the digital certificate.

Below is the YAML that will create our ClusterIssuer. Notice the namespace, the reference to Let’s Encrypt staging server (we can switch to prod later if needed), reference to the domain name you just created and the secret we created a few steps above. Also note that we are going to use a dns01 challenge.

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-staging-dns
  namespace: ambassador
spec:
  acme:
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    email: <your-email-id>
    privateKeySecretRef:
      name: letsencrypt-staging
    solvers:
      - selector:
          dnsZones:
          - "<your-domain-name>"
        dns01:
          route53:
            region: us-east-1
            accessKeyID: <your-IAM-user-access-id>
            secretAccessKeySecretRef:
              name: route53-secret
              key: secret-access-key

Once we create the ClusterIssuer, we can ask for a Certificate. We do that with the following YAML:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: <your-domain-name>-com
  namespace: ambassador
spec:
  secretName: <your-domain-name>-com-tls
  issuerRef:
    name: letsencrypt-staging-dns
    kind: ClusterIssuer
  commonName: <your-domain-name>
  dnsNames:
  - <your-domain-name>

In a few moments, we should have a shiny new certificate, fully managed by cert-manager.

Now, it is the turn of Ambassador. To begin, we will delete the Listener and Mapping we created in our last post.

kubectl delete listener <listener-name-from-get>
kubectl delete mapping <mapping-name-from-get>

Then, we will create two Listeners (one for HTTPS and another for HTTP which will just redirect to HTTPS), a Host and a Mapping.

First, we will create the Host. A host represents any upstream entity for Ambassador. The host should have a hostname and this same hostname should be referred to in the Mapping as well, which we will create next. Note that this host is the one that refers to the certificate we created earlier, in the tlsSecret attribute. The redirect action is the one that redirects HTTP traffic to HTTPS.

apiVersion: getambassador.io/v3alpha1
kind: Host
metadata:
  name: <your-domain-name>-host
  namespace: ambassador
  labels:
    fordomain: <your-domain-name>
spec:
  hostname: "<your-domain-name>"
  tlsSecret:
    name: <your-domain-name>-com-tls
  requestPolicy:
    insecure:
      action: Redirect

Listener YAML looks like this. Notice that we are setting the protocol even for HTTP listener as HTTPS. Apparently, this is so that some HTTPS headers can be correctly managed. Note the hostBinding which selects the host we just created via the matchLabels attribute.

apiVersion: getambassador.io/v3alpha1
kind: Listener
metadata:
  name: http-listener
  namespace: ambassador
spec:
  port: 8080
  protocol: HTTPS
  securityModel: XFP
  hostBinding:
    selector:
      matchLabels:
       fordomain: <your-domain-name>
---
apiVersion: getambassador.io/v3alpha1
kind: Listener
metadata:
  name: https-listener
  namespace: ambassador
spec:
  port: 8443
  protocol: HTTPS
  securityModel: XFP
  hostBinding:
    selector:
      matchLabels:
        fordomain: <your-domain-name>

Finally, we will create the Mapping. Here, I am going to add another nuance. In the first mapping we created in the previous post, we didn’t specify a “resolver”. A resolver is a resource that defines how Ambassador discovers services in the Kubernetes cluster. This is a key functionality since this is how microservices are located on the network. As microservices can come and go in a cloud/distributed environment, a real-time service discovery mechanism is necessary for an API gateway/ingress like Ambassador.

Ambassador supports 3 mechanisms for service discovery. The default is Kubernetes service-level discovery. Here, Ambassador leaves Kubernetes to manage pod-level load balancing and only connects at the service level. While Kubernetes has a reasonable mechanism to do this as we saw earlier, this is not quite what we want. We would like Ambassador to perform L7 load balancing for us.

The second option is Kubernetes endpoint-level discovery, which is what we will use here. In this approach, Ambassador bypasses Kubernetes routing layer and directly manages/LBs across pods. This way, we can define the LB routing algorithm and use other cool features like session affinity etc.

The third option is to use Consul to provide endpoint-level discovery info. This is useful in heterogenous networks where we have both Kubernetes and non-Kubernetes services.

The first step is to create a KubernetesEndpointResolver.

apiVersion: getambassador.io/v3alpha1
kind: KubernetesEndpointResolver
metadata:
  name: endpoint

Now, we can use this endpoint in our Mapping definition along with a routing policy of round robin. This will make Ambassador (technically, Envoy, the proxy which powers Ambassador) our Load balancer. Ambassador supports round robin, least request, and two sticky session policies: maglev and ring hash. I intend to explore a sticky session implementation in a future post.

apiVersion: getambassador.io/v3alpha1
kind: Mapping
metadata:
  name: helloworld-backend
spec:
  hostname: "<your-domain-name>"
  prefix: /helloworld/
  service: helloworld:8000
  resolver: endpoint
  load_balancer:
    policy: round_robin

Phew. That was quite a bit of YAMLing. At the end of it, we can now invoke our service with a new HTTPS URL!

https://<your-domain-name>/helloworld/

What! We still see a HTTPS error? Yes, that’s because we are using the Let’s Encrypt staging URL which is not the real deal. Once we get this all working and troubleshooted, we can switch to their production version which will issue the real certificate. Oh, and in case it is not obvious, if you try the same URL outside of your private network, you will get nothing :).

Code

No code. All config!

Summary

By using Let’s Encrypt, cert-manager and Ambassador Edge Stack, we have established a secure communication to our helloworld microservice, and also terminated it at the API gateway itself so that the rest of the internal communication can be simpler and unencrypted.

System Design Labs