Road to startup infra - Part 3 - Kubernetes HA Cluster

In this post we will talk about how to bootstrap a high available (HA) kubernetes cluster.

The goal is to create one HA cluster in a baremetal like infraestructure or cloud provider, or even multi datacenter and providers.

First attempt was to use Scaleway baremetal cloud servers. It's cheap, it works, but I migrate to Hetzner cloud after evaluation, because the network model is more robust.

Scaleway is a unique service, they even manufacture they own servers. When I started writing this post the cluster was using its services, so I will keep a summary of the cluster on the post.

The procedures are agnostic.

Baremetal Cluster

Scaleway cloud can be rought summarise in that picture:

{{< figure src="/images/scaleway-overview.png" title="Scaleway Overview" >}}

That means a couple of things you may not expect:

  • Your server has only one IP in a private range
  • They do NAT to attach your public IP in your instance
  • Your private servers can talk with each other, but you can't guarentee they are in same network, it could be a internal route and path involved.
  • Other customers can see you

They do that because IPV4 dead winter is comming (shortrange). Is something we have to deal with and plan ahead and is ok.

This has the advantage that you can move public IP's between hosts, in fact you have a API to do it programmatically. But I really wish they also implemented a VLAN to isolate customer's, or even better, give us control over the VLAN in a software defined and isolated network.

Nice things:

  • You can use terraform as if you are provisioning cloud virtual services.
  • They offer Virtual Machines too.
  • They offer arm hardware, some with huge 128GB RAM
  • Server for every need, small or big.
  • Really affordable
  • Snapshots, and hot snapshots (comming soon)

Not so nice:

  • They have datacenters only in europe (Paris and Amsterdan), for me that means big latence (around 210ms).
  • In principal you can't really trust the internal network, its not so privite after all.
  • Your internal ip address use DHCP and just keep changing after each reboot.

After tests I decide the latency is not big deal for now, and as we will see we could move services accross continet's easy.

Lets move on.

Cluster Overview

Lets face it, HA its hard... But not so hard as you may think, and not so expensive depending on your demands. In fact this cluster is more cheap than my previous single monster machine, but that is more merit of a cheap good provider and my diligent to plan and keep the minimum, without sacrifice scale and quality.

After care thinking and trying to minimize costs, but keeping good piece of mind, I came out with the follow:

{{< figure src="/images/infra.png" title="Kubernetes Cluster" height="300px" width="300px" >}}

The cluster has 5 machines:

  • 3 Masters: This nodes run etcd and kubernetes master api. Each server is on a different datacenter 2 on German, 1 in Finland.
  • 2 Workers: Each worker is on a different datacenter both in German.

For the kubernetes API, load balancer is done on DNS level. I think this is good enough because is just a few admin connections.

For application loadbalancer I started small: Ingress will bind to port 80 and 443 of worker nodes, DNS will be setup according. For larger traffic you should consider using dedicated loadbalancers or the cloud provider loadbalancer (this may be expansive)

All machines including admin station will connect each other in a private mesh encrypted VPN. This will simulate a traditional internal network, and can scale for many datacenter.

If a mesh VPN is new for you, you must know only two things:

  1. It has no single point of failure. Is a peer to peer protocol, any node can come and go.
  2. It finds the best path and communicates directly. Negligible latency lost (1, 2 ms mainly because of encryption) and great performance. Much faster than OpenVPN for example.

Kubernetes has a network that span across nodes, so it is a mesh network. What I'm talking will do the same, but has different propose: Is for administration and is a software defined network for your servers. That means kubernetes network will be on top of this virtual network as if it was the physical one. That is way we don't need encryption from the kubernetes network, in fact, we must disable it.

There is another advantage: You can move the ip address with you. That means, moving a machine from a datacenter to another is as easy it could get. Pay attention that I have encountered the situation (without this VPN) where move the master for example is a headache. Certificates and other installation details are bound to the ip address, making this not strait forward.


Terraform the machines

We will use Ubuntu 18.04 for the machines. And Terraform to create.

One note: I have build that cluster many times man, terraform will help you move and scale faster. First I was using terraform + ansible, now I'm on Salt Stack and here is why

For ansible follow this tracks: This was what worked best for me.

Change root password

If you want to change Scaleway servers do not allow root to login by default, you use only sshkeys, but we want to be able to login in the serial console on dashboard in case something goes wrong in the future.

To do that with ansible run the code

ansible -i inventories/scaleway/ all -m user -a "name=root update_password=always password={{ 'putyourfancypassword' | password_hash('sha512') }}"

Prepare the hosts

Create a account in and create a VPN network and DHCP range for it. Setup the network id in file roles/zerotier/defaults/main.yml, and vpn network range in roles/security/defaults/main.yaml

Create the inventory for the machines and run:

ansible-playbook -i inventories/scaleway playbooks/prepare.yaml

The script will:

  • Make sure python is installed on all hosts (required for ansible)
  • Install common packages
  • Hard the security of the system
  • Install a mesh VPN using zerotier

Create DNS records

Create a DNS record for all your masters public ips. This will be used for cli adminstration.

Installing cluster in private network

All cluster traffic will use VPN, and we will use this for setup.

After running the playbook, zerotier will be installed in all machines, you will need authorize access to all machines in dashboard, before continuing.

After that create a pharos-cluster.yml file with all machines and their private adddress.

To facilitate you can use:

ansible -i inventories/scaleway all -m shell -a 'ip addr show zt5u4y6ejv' 

Add the DNS you create as api endpoint in pharos-cluster.yml

Install cluster:

Setup pharos-cluster.yml. Change wave trusted_networks to the VPN CIDR

After the pharos-cluster.yml setup it, run:

pharos up -c pharos-cluster.yml

Test ports again

Double check the open ports.

Run this 3 times to check security in sensible ports of the cluster. First time from outside cluster against the public ips. Second time from VPN, third time from scaleway private network (you can use one of your servers).

The results should follow your firewall setup, and VPN must have more open ports.

Plus: Intelligent DNS loadbalancer

You can cut costs by using DNS as loadbalancer, this is not a good idea for production, but you could use an intelligent loadbalancer that has a short TTL and can remove records based on system status. NS1 is a provider that offer this service.

Be aware that ip loadbalancer has a lot of advantages when serving consumer traffic.

About Harding system security

The security script will:

  • Stop ssh spammers. Using ssh-keys and not permit password logins means they will not enter your system, unless you compromise the keys. This is by default in scaleway systems. For piece of mind and annoying logs in system log, the playbook will install sshguard to stop brute force attempts.
  • Setup firewall. Only some specific ports will be allowed like kube api http https and ssh, everything else will be blocked, except from VPN. All source address from VPN will be accepted.

Pro Tip: Call me paranoid, but if your host system is compromised and you ssh-keys or scaleway keys are stolen your are completely fucked. You could add password to ssh-keys, but I'm lazy to type a secure (huge) password each time. For this I use a fully encrypted linux virtual machine only for the purpose of administering the cluster.


I think we setup a high secured and high available kubernetes cluster.

There is for sure automation to be done, like we could setup the DNS record for kubeapi using terraform, authorize the machines in zerotier, and also create the inventory and pharos-cluster.yml automatically. But even with some manual intervention, the setup process is smooth.

For day 2 operations we still need polish somethings, like adding backup for etcd (crucial part of system), adding a durable storage for your applications, and performing backups on application level (databases, files etc..), monitoring, etc.

I have a roadmap for backup and storage, see you in the next post?

Let your thoughts in the comments, and if you find anything that could be improved, please share the word.


To check etcd cluster health in Pharos use:

Cluster Health

 kubectl -n kube-system exec -t  etcd-master1 -- ash -c 'etcdctl --endpoints=https://localhost:2379 --ca-file=/etc/kubernetes/pki/ca.pem --cert-file=/etc/kubernetes/pki/etcd/client.pem --key-file=/etc/kubernetes/pki/etcd/client-key.pem cluster-health'