How to Design a Kubernetes Cluster
Aug 31, 2020How to Design a Kubernetes Cluster
Kubernetes is an orchestration system which deploys multiple containers(nodes) across the server. Nowadays Kubernetes is an essential part of every organization as it helps in scaling the application seamlessly. As the use of Kubernetes is increasing drastically in different enterprises. The application management and service availability across all the cloud platforms mature like AKS, GKS, EKS, OpenShift etc.
The initial installation and provisioning of Kubernetes can be daunting but creating a cluster is only the first step. We need to take a look at how to set up a cluster in terms of usage. Many people overlook this step.
In this post, I will help you to develop a deeper understanding of Kubernetes by revealing some of the principles underpinning its design and also it will work as a decent guide to set up Kubernetes and Helm in a cluster used by a few different teams.
Let’s take a problem statement and try to design a solution around it.
Problem Statement: You are a SysAdmin/SRE/Generally Awesome Person who has a task to set up a cluster for a bunch of people who are developing using technologies like ML, ETL, AI and Super Awesome Apps.
You want to enable the autonomy of these teams by building a cluster for them where no one is running into each other or no one is blocking the other person and you want it to scale seamlessly but without the worry of breaking anything in the deployment.
We are assuming here that you will be going forward with your preferred flavour offered by Kubernetes. Some of the preferred flavours are GKE, AKS, OpenShift, or EKS. If you are not choosing any of these and planning to start on your own, then also you can have a look at OpenShift or doing something with Kops, but there are some other considerations which you have to take care of like managing availability and making sure that etcd behaves itself. That discussion is out of the scope for this post.
Required Ingredients to design your Kubernetes cluster
- Labels: I am taking Labels and selectors as the first point because they are great and have a loosely coupled way to define how you are going to organize services, define requirements etc. I am not going to re-write the existing documentation which is super fantastic so directly pointing you there: Labels and Selectors.
Now, as you have read about Labels and selectors but let’s assume a scenario where the guys who are building apps rely on different needs (IO/Memory/etc..). Now one of my team wants to work on an app that can become more efficient if it has high-speed I/O for that I would label a node with diskspeed: high, and then use a nodeSelector in my pod spec that would pin that pod to that node.
If you being the admin, don’t want to be on the hook for the call when someone put drive type: SSD or diskspeed: really superfast or whatever, and their pod could not be scheduled, it would be good to have a defined list of labels and selectors in place that your teams can find so when they are building their manifests. It will also take care that you don’t have to spec your nodes to the highest common denominator of all the workloads. This is a simple example to show how a defined key/value set can be beneficial for you and publishing it in central place means that your team won’t drive you crazy.
- RBAC: RBAC stands for Role Based Access Control and for detailed understanding you can check the documentation here. You need to set this on as it is really important to avoid, not just bad actors taking over your system, but a well-intentioned employee making a mistake. This will allow you to scope your users to a specific resource or set of resources. This will work hand-in-hand with the namespaces, just because of the simplicity of scoping at the ClusterRole level vs the Role.
What is ClusterRole? It will allow you to grant subjects access to resources in the cluster like nodes, or all the pods in a namespace, whereas Role will only allow you to grant subjects access to resources within a namespace.
You should predefine a set of ClusterRoles that will allow you to quickly grant access by creating/modifying the ClusterRoleBindings. This means that if you create them right off the bat, you won’t have to worry about dealing with one-off requests for access later.
As a high level I would like to suggest this structure: Namespace Admin Manages all the goings-on in the namespace, Roles (note not cluster roles…because we are within a namespace), role bindings, deployments, etc.
If you are using Namespace Deployment Manager it is Allowed to "get", "list", "watch", "create", "update", "patch", "delete"resources in that namespaceCluster Reader — Allowed to "get", "list", "watch" resources in that namespace.
There is also an other option ABAC which stands for attribute-based access control. RBAC is more beneficial and necessary in overall usage today.
- Namespaces: In addition to the RBAC, ClusterRoleBinding discussion above, I wanted to touch on namespaces. Using namespaces allows you to ensure that you can grant the teams autonomy, without sacrificing overall cluster security. i.e. Avoiding the “Hey that guy deleted my deployment and deployed his deployment” situation. Allow each team to have their own namespace, while maintaining separate namespaces for things like ingress or logging. This should be done during one-time provisioning of the new project to be hosted on the cluster. Namespaces also allow you to implement resource quotas. This is in addition to the issue above where someone removes someone else’s work, but you want to make sure that one workload does not affect anyone else’s.
- Resource Quotas: This is a resource that allows you to limit the total amount of computing resources that a namespace can take up. You can essentially limit the requests and usage of CPU, storage, GPU, objects in a namespace. These quotas need to be balanced with autoscaling groups. This is because the resource quotas are defined as a number which is a hard unit, that means if I add another node to my cluster the overall quotas will remain the same. So to be clear, cluster capacity and resource quotas are two separate elements but they need to be managed together.
- Autoscaling: Basically this will be turned on or off, depending on your managed k8s provider. Autoscaling will allow your cluster to grow from a minimum node size to maximum node size. This is wonderful from a cost perspective and in a cloud scenario. As you don’t want your nodes hanging out there doing nothing, or an overall capacity of 10% across your nodes.
Some of the things you need to consider like availability zone for your nodes. You should make sure that you have a minimum number of nodes across the required number of AZ’s for redundancy.
You need to balance this with Resource quotas. This means that you want to set it up so that you can leverage the extra compute when you need it and don’t restrict yourself to a lower cluster size.
Autoscaling, allows you to perform horizontal pod scaling, vertical pod scaling, and then node scaling to accommodate the additional compute. This is very useful if your majority of teams are in one time zone, thereby experiencing most of your access during 8 hours of the day and not so much at night.
- Helm: We won’t discuss helm a lot. But for a brief overview, it is like a package manager (homebrew, apt, yum, etc.) for your k8s projects and its wonderful. Allowing your team to create a helm chart and an internal repo to maintain all the dependencies, and configure your deployments between staging and prod and making your deployments more repeatable. I would suggest creating a base helm chart which is very simple and work as a template for your team members. It can show the simple templating and configuration via values.yaml
A Note on RBAC for Helm
If your cluster has RBAC enabled and you want to add the component that deploys the chart to your cluster is the tiller. So, there is a various configuration you can do but I can’t suggest a better configuration than the documentation. You want to scope tiller to the namespace that you are working and you need to maintain one tiller per team and create a service account that allows a tiller to deploy to that namespace.
This will allow the individual teams to have the ability to deploy their own charts without affecting anyone else.
Join TechCommanders Today.
Over 60 Courses and Practice Questions!
Coaching and CloudINterviewACE
Stay connected with news and updates!
Join our mailing list to receive the latest news and updates from our team.
Don't worry, your information will not be shared.
We hate SPAM. We will never sell your information, for any reason.