Starting Bottlerocket in self-managed EKS node pools

Bottlerocket is a specialized operating system for the era of the container. In opposite to general-purpose operating systems, it's only able to connect to the Kubernetes control plane and run the containers. That's it. 

But it also has one great feature: the automatic update system. In EKS you typically need to solve updating workflow with some Infrastructure as a Code mechanism i.e. IaC tool gets the AMI ID from the SSM and then you need to rotate all the instances somehow. It's not really needed with Bottlerocket since it is able to retrieve and safely install updates automatically. Let's check the whole picture!

The key differences

Official documentation for the Bottlerocket is full of eksctl stuff. Well, personally I don't use this tool in any production environment so I had to find a way how to configure launch templates properly. 

Thankfully it's pretty easy, the only difference (when compared to Amazon Linux AMI) is the contents of the user-data and one additional IAM policy.

User-data

In Amazon Linux AMI we were using some sort of Shell script that was calling the bootstrap script with some parametrization. Bottlerocket uses TOML formatted configuration file and I must say it's a huge relief - It's much more difficult to make some mistakes there 😂

[settings.host-containers.control]
"enabled" = true
[settings.kubernetes.node-labels]
"kubernetes.eks.rocks/nodepool" = "main"
"bottlerocket.aws/updater-interface-version" = "2.0.0"
[settings.kubernetes]
"api-server" = "https://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.sk1.eu-west-1.eks.amazonaws.com"
"cluster-certificate" = "LS0tLS1CRUdJTiBDRVJUN5RENDQWJDZ0F3SUJBZ0..."
cluster-name = "cluster"

settings.host-containers.control

This section of the configuration file enables the control container. This container runs the SSM agent that lets you start shell session so you can debug stuff directly from AWS console (SSM Session Manager).

settings.kubernetes.node-labels

Well, this is straightforward. Here you can specify all the labels for the node. There's also a similar section for taints.  

But what about the strange label bottlerocket.aws/updater-interface-version, what's its purpose here? This label is really important for the automatic updating system. I will talk about it in a bit.

settings.kubernetes

And here we are supplying the coordinates where the node should connect to. You can find all these details in the output from aws eks describe-cluster.  Alternatively, it's always available as the output in Infrastructure as Code tools (Terraform, CloudForamtion, Pulumi) so you can assemble the TOML configuration file directly in the infrastructure definition 💪

{
  "cluster": {
    "name": "cluster",
    "arn": "arn:aws:eks:eu-west-1:111111111111:cluster/cluster",
    "createdAt": "2021-03-24T11:37:45.333000+01:00",
    "version": "1.19",
    "endpoint": "https://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.sk1.eu-west-1.eks.amazonaws.com",
    "roleArn": "arn:aws:iam::111111111111:role/cluster_eks_service_role_eu-west-1",
    "resourcesVpcConfig": {
      "subnetIds": [
        "subnet-xxxxxxxxxxxxxxxxx",
        "subnet-xxxxxxxxxxxxxxxxx"
      ],
      "securityGroupIds": [
        "sg-xxxxxxxxxxxxxxxxx"
      ],
      "clusterSecurityGroupId": "sg-xxxxxxxxxxxxxxxxx",
      "vpcId": "vpc-11111111111111111",
      "endpointPublicAccess": true,
      "endpointPrivateAccess": true,
      "publicAccessCidrs": [
        "0.0.0.0/0"
      ]
    },
    "kubernetesNetworkConfig": {
      "serviceIpv4Cidr": "172.20.0.0/16"
    },
    "logging": {
      "clusterLogging": [
        {
          "types": [
            "api",
            "audit",
            "authenticator",
            "controllerManager",
            "scheduler"
          ],
          "enabled": false
        }
      ]
    },
    "identity": {
      "oidc": {
        "issuer": "https://oidc.eks.eu-west-1.amazonaws.com/id/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
      }
    },
    "status": "ACTIVE",
    "certificateAuthority": {
      "data": "LS0tLS1CRUdJTiBDRVJUN5RENDQWJDZ0F3SUJBZ0..."
    },
    "platformVersion": "eks.2",
    "tags": {},
    "encryptionConfig": [
      {
        "resources": [
          "secrets"
        ],
        "provider": {
          "keyArn": "arn:aws:kms:eu-west-1:111111111111:key/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
        }
      }
    ]
  }
}

IAM

As mentioned before, the control container is able to communicate with AWS SSM Session Manager. Hence we need to allow such actions. As mentioned before, the control container is able to communicate with AWS SSM Session Manager. Hence we need to allow such actions. The standard list of AWS-managed policies needs to be extended with arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore

This is the full list of policies we need to attach to workers.

  • arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
  • arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
  • arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
  • arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore

Infrastructure as code example

Now let's just put this stuff together. So we have some IAM role:

        "EKSNodeGroupInstanceRole": {
            "Type": "AWS::IAM::Role",
            "DeletionPolicy": "Retain",
            "UpdateReplacePolicy": "Retain",
            "Properties": {
                "AssumeRolePolicyDocument": {
                    "Version": "2012-10-17",
                    "Statement": [
                        {
                            "Effect": "Allow",
                            "Principal": {
                                "Service": [
                                    "ec2.amazonaws.com"
                                ]
                            },
                            "Action": [
                                "sts:AssumeRole"
                            ]
                        }
                    ]
                },
                "ManagedPolicyArns": [
                    "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy",
                    "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy",
                    "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly",
                    "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
                ],
                "Path": "/",
            }
        },

This role is used by the instance profile:

                    "IamInstanceProfile": {
                        "Arn": {
                            "Fn::GetAtt": [
                                "EKSInstanceProfile",
                                "Arn"
                            ]
                        }
                    },

And the instance role is used in the launch template together with the TOML configuration file:

        "EKSLaunchTemplateMain": {
            "Type": "AWS::EC2::LaunchTemplate",
            "Properties": {
                "LaunchTemplateData": {
                    "BlockDeviceMappings": [
                        {
                            "DeviceName": "/dev/xvda",
                            "Ebs": {
                                "Encrypted": true,
                                "DeleteOnTermination": true,
                                "VolumeSize": {
                                    "Ref": "MainPoolDiskSize"
                                },
                                "VolumeType": "gp3"
                            }
                        }
                    ],
                    "IamInstanceProfile": {
                        "Arn": {
                            "Fn::GetAtt": [
                                "EKSInstanceProfile",
                                "Arn"
                            ]
                        }
                    },
                    "ImageId": "ami-05104bcc510750440",
                    "InstanceType": "m5.xlarge",
                    "SecurityGroupIds": [
                        {
                            "Ref": "SomeSecurityGroupForEKSNodes"
                        }
                    ],
                    "UserData": {
                        "Fn::Base64": {
                            "Fn::Join": [
                                "\n",
                                [
                                    "[settings.host-containers.control]",
                                    "\"enabled\" = true",
                                    "[settings.kubernetes.node-labels]",
                                    "\"nodepool\"=\"main\"",
                                    "\"bottlerocket.aws/updater-interface-version\" = \"2.0.0\"",
                                    "[settings.kubernetes.eviction-hard]",
                                    "\"memory.available\" = \"15%\"",
                                    "[settings.kubernetes]",
                                    {
                                        "Fn::Join": [
                                            "",
                                            [
                                                "\"api-server\"",
                                                " = ",
                                                "\"",
                                                {
                                                    "Ref": "EKSClusterAPIEndpoint"
                                                },
                                                "\""
                                            ]
                                        ]

                                    },
                                    {
                                        "Fn::Join": [
                                            "",
                                            [
                                                "\"cluster-certificate\"",
                                                " = ",
                                                "\"",
                                                {
                                                    "Ref": "EKSClusterCAAuthority"
                                                },
                                                "\""
                                            ]
                                        ]
                                    },
                                    {
                                        "Fn::Join": [
                                            "",
                                            [
                                                "cluster-name",
                                                " = ",
                                                "\"",
                                                {
                                                    "Ref": "EKSClusterName"
                                                },
                                                "\""
                                            ]
                                        ]
                                    }
                                ]
                            ]
                        }
                    },
                    "MetadataOptions": {
                        "HttpPutResponseHopLimit": 2,
                        "HttpEndpoint": "enabled",
                        "HttpTokens": "optional"
                    }
                }
            }
        },

Now, the launch template can be used in the autoscaling group:

        "EKSAutoscalingGroupMain": {
            "Type": "AWS::AutoScaling::AutoScalingGroup",
            "Properties": {
                "MaxSize": {
                    "Ref": "NodeMaxSizeMainPool"
                },
                "MinSize": {
                    "Ref": "NodeMinSizeMainPool"
                },
                "LaunchTemplate": {
                    "LaunchTemplateId": {
                        "Ref": "EKSLaunchTemplateMain"
                    },
                    "Version": {
                        "Fn::GetAtt": [
                            "EKSLaunchTemplateMain",
                            "LatestVersionNumber"
                        ]
                    }
                },
                "VPCZoneIdentifier": {
                    "Ref": "Subnets"
                },
                "Tags": [
                    {
                        "Key": {
                            "Fn::Sub": [
                                "kubernetes.io/cluster/${ClusterName}",
                                {
                                    "ClusterName": {
                                        "Ref": "EKSClusterName"
                                    }
                                }
                            ]
                        },
                        "PropagateAtLaunch": true,
                        "Value": "owned"
                    },
                    {
                        "Key": {
                            "Fn::Sub": [
                                "k8s.io/cluster-autoscaler/${ClusterName}",
                                {
                                    "ClusterName": {
                                        "Ref": "EKSClusterName"
                                    }
                                }
                            ]
                        },
                        "PropagateAtLaunch": true,
                        "Value": "owned"
                    },
                    {
                        "Key": "k8s.io/cluster-autoscaler/enabled",
                        "PropagateAtLaunch": true,
                        "Value": "TRUE"
                    },
                    {
                        "Key": "k8s.io/cluster-autoscaler/node-template/label/kubernetes.eks.rocks/nodepool",
                        "PropagateAtLaunch": true,
                        "Value": "main"
                    }
                ]
            },
            "UpdatePolicy" : {
                "AutoScalingScheduledAction": {
                    "IgnoreUnmodifiedGroupSizeProperties": "true"
                }
            }
        },

The update operator

Now we're approaching the most interesting part of the Bottlerocket: updates. As mentioned before, Bottlerocket is able to download updates, stage them to the secondary boot volume and boot from this updated system next time.

However, it does not happen automatically. Bottlerocket does not know about its neighbours so it can't just magically install updates and reboot the node at any time.  This needs to be orchestrated and that's the point when we need to install the official update operator.

This infrastructure component has 2 parts: daemonset that communicates with Bottlerocket API and the deployment that runs the controller which is responsible for the orchestration.

Both parts can be installed with single YAML file:

kubectl apply -f https://raw.githubusercontent.com/bottlerocket-os/bottlerocket-update-operator/develop/update-operator.yaml 

When you install the operator to the cluster, you can find some activity in the nodes' annotations. 

bottlerocket.aws/action-active: prepare-update
bottlerocket.aws/action-state: busy
bottlerocket.aws/action-wanted: prepare-update
bottlerocket.aws/update-available: true

or

bottlerocket.aws/action-active: stabilize
bottlerocket.aws/action-state: ready
bottlerocket.aws/action-wanted: stabilize
bottlerocket.aws/update-available: false

or

bottlerocket.aws/action-active: reboot-update
bottlerocket.aws/action-state: ready
bottlerocket.aws/action-wanted: reboot-update
bottlerocket.aws/update-available: true

There are more combinations of annotations but I guess you got the idea. Update operator uses these annotations to indicate the status of the given node. So when there's a node with available updates, update operator calls the Bottlerocket API and the updates are then downloaded to the staging volume.

Then, Bottlerocket can orchestrate a reboot of such a node. After the reboot - the node is using the updated operating system.

Sample update of three-node cluster 

NAME                                           STATUS                     ROLES    AGE     VERSION
ip-10-233-156-60.eu-west-1.compute.internal    Ready,SchedulingDisabled   <none>   6m26s   v1.19.9
ip-10-233-157-61.eu-west-1.compute.internal    Ready                      <none>   8m30s   v1.19.9
ip-10-233-157-9.eu-west-1.compute.internal     Ready                      <none>   4m11s   v1.19.9
NAME                                           STATUS                        ROLES    AGE     VERSION
ip-10-233-156-60.eu-west-1.compute.internal    NotReady,SchedulingDisabled   <none>   6m58s   v1.19.9
ip-10-233-157-61.eu-west-1.compute.internal    Ready                         <none>   9m2s    v1.19.9
ip-10-233-157-9.eu-west-1.compute.internal     Ready                         <none>   4m43s   v1.19.9
NAME                                           STATUS   ROLES    AGE     VERSION
ip-10-233-156-60.eu-west-1.compute.internal    Ready    <none>   8m15s   v1.19.10
ip-10-233-157-61.eu-west-1.compute.internal    Ready    <none>   10m     v1.19.9
ip-10-233-157-9.eu-west-1.compute.internal     Ready    <none>   6m      v1.19.9
NAME                                           STATUS                     ROLES    AGE     VERSION
ip-10-233-156-60.eu-west-1.compute.internal    Ready                      <none>   9m27s   v1.19.10
ip-10-233-157-61.eu-west-1.compute.internal    Ready                      <none>   11m     v1.19.9
ip-10-233-157-9.eu-west-1.compute.internal     Ready,SchedulingDisabled   <none>   7m12s   v1.19.9
NAME                                           STATUS                        ROLES    AGE     VERSION
ip-10-233-156-60.eu-west-1.compute.internal    Ready                         <none>   10m     v1.19.10
ip-10-233-157-61.eu-west-1.compute.internal    Ready                         <none>   12m     v1.19.9
ip-10-233-157-9.eu-west-1.compute.internal     NotReady,SchedulingDisabled   <none>   7m46s   v1.19.9
NAME                                           STATUS                     ROLES    AGE     VERSION
ip-10-233-156-60.eu-west-1.compute.internal    Ready                      <none>   11m     v1.19.10
ip-10-233-157-61.eu-west-1.compute.internal    Ready                      <none>   13m     v1.19.9
ip-10-233-157-9.eu-west-1.compute.internal     Ready,SchedulingDisabled   <none>   9m12s   v1.19.10
NAME                                           STATUS   ROLES    AGE     VERSION
ip-10-233-156-60.eu-west-1.compute.internal    Ready    <none>   11m     v1.19.10
ip-10-233-157-61.eu-west-1.compute.internal    Ready    <none>   14m     v1.19.9
ip-10-233-157-9.eu-west-1.compute.internal     Ready    <none>   9m42s   v1.19.10
NAME                                           STATUS                     ROLES    AGE   VERSION
ip-10-233-156-60.eu-west-1.compute.internal    Ready                      <none>   13m   v1.19.10
ip-10-233-157-61.eu-west-1.compute.internal    Ready,SchedulingDisabled   <none>   15m   v1.19.9
ip-10-233-157-9.eu-west-1.compute.internal     Ready                      <none>   10m   v1.19.10
NAME                                           STATUS                        ROLES    AGE   VERSION
ip-10-233-156-60.eu-west-1.compute.internal    Ready                         <none>   13m   v1.19.10
ip-10-233-157-61.eu-west-1.compute.internal    NotReady,SchedulingDisabled   <none>   15m   v1.19.9
ip-10-233-157-9.eu-west-1.compute.internal     Ready                         <none>   11m   v1.19.10
NAME                                           STATUS                        ROLES    AGE   VERSION
ip-10-233-156-60.eu-west-1.compute.internal    Ready                         <none>   14m   v1.19.10
ip-10-233-157-61.eu-west-1.compute.internal    NotReady,SchedulingDisabled   <none>   16m   v1.19.10
ip-10-233-157-9.eu-west-1.compute.internal     Ready                         <none>   12m   v1.19.10
NAME                                           STATUS     ROLES    AGE   VERSION
ip-10-233-156-60.eu-west-1.compute.internal    Ready      <none>   15m   v1.19.10
ip-10-233-157-61.eu-west-1.compute.internal    NotReady   <none>   17m   v1.19.10
ip-10-233-157-9.eu-west-1.compute.internal     Ready      <none>   13m   v1.19.10
NAME                                           STATUS   ROLES    AGE   VERSION
ip-10-233-156-60.eu-west-1.compute.internal    Ready    <none>   15m   v1.19.10
ip-10-233-157-61.eu-west-1.compute.internal    Ready    <none>   17m   v1.19.10
ip-10-233-157-9.eu-west-1.compute.internal     Ready    <none>   13m   v1.19.10

As you can see, it happened one-by-one. Not a single application has been harmed that day ❤️

Wrap

So now you know it's really easy to replace general purpose Amazon AMI with container optimized Bottlerocket. Do you have to do it? Well that's a good question!

First, Bottlerocket really simplifies the update process. Second, I believe container-optimized operating systems make real sense in the Kubernetes ecosystem. They have a smaller footprint, they don't contain unneeded libraries and binaries so they have a smaller attack surface. 

If you don't have any opinion about this, I suggest you following: start your own EKS cluster and try to play with Bottlerocket control container. Explore the API and make your own judgement about the maturity of this operating system. Personally, I don't go back.