EKS Installer

Fury Kubernetes Installer - Managed Services - EKS - oss project.

9 minute read

EKS vs EC2 (managed vs self-managed)

Before continuing, you should understand what are the benefits and drawback of creating an EKS cluster instead of creating your Kubernetes control plane in AWS EC2 instances.

Price

EKS currently costs $0.10 per hour for a HA control plane.

An m5.large EC2 instance currently costs $0.096 per hour. Having an HA cluster with 3 x m5.large instances will cost: $0.096 x 3 instances = $0.288 per hour.

EKS is cheaper in most scenarios.

You can host these instances in reserved instances reducing the control-plane cost, but you have to upfront pay for it for months.

All the cost analysis was done in May 2020, all prices have been taken from the official Amazon pricing lists:

Management

EKS is a fully managed service provided by AWS meaning that you don’t need to worry about backups, recoveries, availability, scalability, certificates… even authentication to the cluster is managed by AWS IAM service.

You’ll have to set up these features if you choose to host your control-plane. Also, other features can be customized in a self-managed setup: audit-logs, enable Kubernetes API server feature flags, set up your own authentication provider and other platform services.

So, if you need to set up a non-default cluster, you should consider going with the self-managed cluster. Otherwise, EKS is a good option.

Day two operations

As mentioned before, EKS is responsible for making the Kubernetes control plane fully operational with a monthly uptime percentage of at least 99.95%.

source: https://aws.amazon.com/eks/sla

On the other side, in a self-managed setup, you have to worry about backups, disaster recovery strategies, HA setup, certificate rotations, control-plane, and worker updates.

Requirements

As mentioned in the common requirements the operator who is responsible for creating an EKS cluster has to have connectivity from the operator’s machine (bastion host, laptop with configured VPN…) to the network where the cluster will be placed.

The machine used to create the cluster should have installed:

  • OS tooling like: git, ssh, curl and unzip.
  • terraform version 0.15.4.
  • latest aws CLI version.

Cloud requirements

This installer requires to have mainly two requirements:

  • Enough permissions to create all resources surrounding the EKS cluster.
  • Read and make sure to be compliant with: Cluster VPC considerations.
    • Special attention to the <cluster-name> placeholder. We need this value to pass it as an input variable of the installer.

Gather all input values

Before starting to use this installer, you should know the value of the input variables:

  • cluster_name: Unique cluster name.
  • cluster_version: EKS version to use. Example: 1.20 or 1.21. Take a look to discover available Amazon EKS Kubernetes versions.
  • network: The VPC ID where the cluster will be created.
  • subnetworks: Private subnet ids where the cluster will be created. Should belong to network.
  • ssh_public_key: Cluster administrator public ssh key. Used to access cluster nodes with the operator_ssh_user
  • dmz_cidr_range: Network CIDR range from where the cluster’s control plane will be accessible.
  • eks_workers_asg_names: This field belongs to the node_pool object. If you need to attach the instances to an external load balancer, this field is relevant for you.

max_pods

WARNING

The max_pods attribute of a node_pool introduced in v1.2.0 overrides an internal compute of the maximum number of pods a node in EKS can run using the default CNI. Modify this value only if you have plans to deploy a different CNI than the default one.

The max_pods is computed using the following document.

Getting started

Make sure to set up all the pre-requirements before continuing including cloud credentials, VPN/Bastion/Network configuration and gathering all required input values.

Create a new directory to save all terraform files:

$ mkdir /home/operator/sighup/my-cluster-at-eks
$ cd /home/operator/sighup/my-cluster-at-eks

Create the following files:

main.tf

variable "cluster_name" {}
variable "cluster_version" {}
variable "network" {}
variable "subnetworks" { type = list }
variable "dmz_cidr_range" {}
variable "ssh_public_key" {}
variable "node_pools" { type = list }
variable "tags" { type = map }
variable "eks_map_accounts" { type = list }
variable "eks_map_roles" { type = list }
variable "eks_map_users" { type = list }

module "my-cluster" {
  source  = "github.com/sighupio/fury-eks-installer//modules/eks?ref=v1.8.0"

  cluster_version  = var.cluster_version
  cluster_name     = var.cluster_name
  network          = var.network
  subnetworks      = var.subnetworks
  ssh_public_key   = var.ssh_public_key
  dmz_cidr_range   = var.dmz_cidr_range
  node_pools       = var.node_pools
  tags             = var.tags
  eks_map_accounts = var.eks_map_accounts
  eks_map_roles    = var.eks_map_roles
  eks_map_users    = var.eks_map_users
}

output "kube_config" {
  value = <<EOT
apiVersion: v1
clusters:
- cluster:
    server: ${module.my-cluster.cluster_endpoint}
    certificate-authority-data: ${module.my-cluster.cluster_certificate_authority}
  name: kubernetes
contexts:
- context:
    cluster: kubernetes
    user: aws
  name: aws
current-context: aws
kind: Config
preferences: {}
users:
- name: aws
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1alpha1
      command: aws
      args:
        - "eks"
        - "get-token"
        - "--cluster-name"
        - "${var.cluster_name}"
EOT
}

Create my-cluster.tfvars including your environment values:

cluster_name    = "my-cluster"
cluster_version = "1.20"
network         = "vpc-id0"
subnetworks = [
  "subnet-id1",
  "subnet-id2",
  "subnet-id3",
]
ssh_public_key = "ssh-rsa example"
dmz_cidr_range = "10.0.4.0/24"
tags = {}

eks_map_accounts = ["0000000000123"] # Basic auth via AWS Account Number
eks_map_roles = []
eks_map_users = []
node_pools = [
  {
    name : "m5-node-pool"
    version : null # To use same value as cluster_version
    min_size : 1
    max_size : 2
    instance_type : "m5.large"
    volume_size : 100
    subnetworks : null
    additional_firewall_rules: []
    labels : {
      "node.kubernetes.io/role" : "app"
      "sighup.io/fury-release" : "v1.2.0-rc1"
    }
    taints: [
      "sighup.io/role=app:NoSchedule"
    ]
    tags : {}
    max_pods : null
    eks_target_group_arns : []
  },
  {
    name : "t3-node-pool"
    version : "1.14" # To use the cluster_version
    os: null # To use the default one
    min_size : 1
    max_size : 1
    instance_type : "t3.micro"
    volume_size : 50
    subnetworks : null
    additional_firewall_rules: []
    labels : {}
    taints: []
    tags : {}
    max_pods : null
    eks_target_group_arns : []
  }
]

With these two files, the installer is ready to create everything needed to set up an EKS Cluster with two different node pools (if you don’t modify the node_pools variable example value) using Kubernetes 1.20.

$ ls -lrt
total 16
-rw-r--r--  1 sighup  staff  1171 27 abr 16:35 my-cluster.tfvars
-rw-r--r--  1 sighup  staff  1128 27 abr 16:36 main.tf
$ terraform init
Initializing modules...
Downloading github.com/sighupio/fury-eks-installer?ref=v1.8.0 for my-cluster...
- my-cluster in .terraform/modules/my-cluster/modules/eks
Downloading terraform-aws-modules/eks/aws 16.2.0 for my-cluster.cluster...
- my-cluster.cluster in .terraform/modules/my-cluster.cluster
- my-cluster.cluster.fargate in .terraform/modules/my-cluster.cluster/modules/fargate
- my-cluster.cluster.node_groups in .terraform/modules/my-cluster.cluster/modules/node_groups

Initializing the backend...

Initializing provider plugins...
- Finding hashicorp/local versions matching ">= 1.4.0"...
- Finding latest version of hashicorp/cloudinit...
- Finding hashicorp/aws versions matching ">= 3.22.0, >= 3.37.0, 3.37.0"...
- Finding hashicorp/kubernetes versions matching ">= 1.11.1, 1.13.3"...
- Finding hashicorp/random versions matching ">= 2.1.0"...
- Finding terraform-aws-modules/http versions matching ">= 2.4.1"...
- Installing hashicorp/cloudinit v2.2.0...
- Installed hashicorp/cloudinit v2.2.0 (signed by HashiCorp)
- Installing hashicorp/aws v3.37.0...
- Installed hashicorp/aws v3.37.0 (signed by HashiCorp)
- Installing hashicorp/kubernetes v1.13.3...
- Installed hashicorp/kubernetes v1.13.3 (signed by HashiCorp)
- Installing hashicorp/random v3.1.0...
- Installed hashicorp/random v3.1.0 (signed by HashiCorp)
- Installing terraform-aws-modules/http v2.4.1...
- Installed terraform-aws-modules/http v2.4.1 (self-signed, key ID B2C1C0641B6B0EB7)
- Installing hashicorp/local v2.1.0...
- Installed hashicorp/local v2.1.0 (signed by HashiCorp)

Partner and community providers are signed by their developers.
If you'd like to know more about provider signing, you can read about it here:
https://www.terraform.io/docs/cli/plugins/signing.html

Terraform has created a lock file .terraform.lock.hcl to record the provider
selections it made above. Include this file in your version control repository
so that Terraform can guarantee to make the same selections by default when
you run "terraform init" in the future.

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
$ terraform plan --var-file my-cluster.tfvars --out my-cluster.plan
<TRUNCATED OUTPUT>
Plan: 36 to add, 0 to change, 0 to destroy.

──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Saved the plan to: my-cluster.plan

To perform exactly these actions, run the following command to apply:
    terraform apply "my-cluster.plan"

Review carefully the plan before applying anything. It should create 30 resources.

$ terraform apply my-cluster.plan
<TRUNCATED OUTPUT>
Apply complete! Resources: 36 added, 0 changed, 0 destroyed.

Outputs:

kube_config = <sensitive>

To get your kubeconfig file follow these simple commands:

kubectl will use aws CLI to authenticate against the cluster.

$ terraform output -raw kube_config > kube.config
$ kubectl cluster-info --kubeconfig kube.config
Kubernetes control plane is running at https://7F437751C93D18942B95FE6FEADFE62A.gr7.eu-central-1.eks.amazonaws.com
CoreDNS is running at https://7F437751C93D18942B95FE6FEADFE62A.gr7.eu-central-1.eks.amazonaws.com/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
$ kubectl get nodes --kubeconfig kube.config
NAME                                            STATUS   ROLES    AGE   VERSION
ip-10-0-182-105.eu-central-1.compute.internal   Ready    <none>   35s   v1.20.7-eks-135321
ip-10-0-192-64.eu-central-1.compute.internal    Ready    <none>   43s   v1.20.7-eks-135321

Update control plane

To update the control plane, just modify the cluster_version with the next version available

$ diff my-cluster.tfvars my-cluster-updated.tfvars
2c2
< cluster_version = "1.20"
---
> cluster_version = "1.21"

after that modifiying the cluster_version execute:

$ terraform plan --var-file my-cluster-updated.tfvars --out my-cluster.plan
<TRUNCATED OUTPUT>
Plan: 4 to add, 3 to change, 4 to destroy.

Please, read carefully the output plan. Once you understand the changes, apply it:

$ terraform apply my-cluster.plan
<TRUNCATED OUTPUT>
Apply complete! Resources: 4 added, 3 changed, 4 destroyed.

It can take up to 25-30 minutes.

After updating the control-plane you end up with:

  • EKS control plane updated from Kubernetes version 1.20 to 1.21
  • The m5-node-pool ready to roll out new nodes with 1.21 version. (Updated as it uses cluster_version)
  • The t3-node-pool remains in 1.20 Kubernetes version.

Update node pools

To update a node pool, just modify the node_pool’s version attribute with the same version as the control-plane:

If you have set null, you don’t need to do anything else, node_pools with null version are updated alongside the control-plane update procedure.

$ diff my-cluster.tfvars my-cluster-updated.tfvars
26c26
<     version : "1.20"
---
>     version : "1.21"

after that run:

$ terraform plan --var-file my-cluster-updated.tfvars --out my-cluster.plan
<TRUNCATED OUTPUT>
Plan: 2 to add, 1 to change, 2 to destroy.

Review the plan before applying anything:

$ terraform apply my-cluster.plan
<TRUNCATED OUTPUT>
Apply complete! Resources: 2 added, 1 changed, 2 destroyed.

You will need to roll out your nodes to get a new one with the new version installed.

Consider increasing the number of nodes, migrate workloads to the updated nodes and scale down to the original number of nodes.

Lift and Shift node pool update

You can apply another node pool update strategy named lift and shift. Create a new node pool with the new updated version then move all workloads to the new nodes and remove/set to 0 the number of instances in the old node pool.

Tear down the environment

If you don’t need anymore the cluster, go to the terraform directory where create the cluster (cd /home/operator/sighup/my-cluster-at-eks) and type:

$ terraform destroy --var-file my-cluster.tfvars
<TRUNCATED OUTPUT>
Plan: 0 to add, 0 to change, 36 to destroy.

Do you really want to destroy all resources?
  Terraform will destroy all your managed infrastructure, as shown above.
  There is no undo. Only 'yes' will be accepted to confirm.

Type yes and press intro to continue the destruction. It will take around 15 minutes.