Why Container Orchestration?
Docker provides a run time environment and helps us to run containers and build images. We have also seen the other advantages it brings in the earlier posts. But having said that, one of the main limitations is that it can run the container only on a single host. In a microservices architecture where multiple containers interact with each other, Docker-Compose helps us to bring up all the application containers together but again on a single host. This does not help when the application scales across multiple hosts when we move the workloads into the production environments. In comes container orchestration engine which helps us get around these problems to a large extent.
Container Orchestration Engine
Imagine a situation where you have a set of servers and you need to keep looking to which server a particular container should go. This could be based on the compute KPIs or other parameters. As cumbersome as it may sound, this is a much bigger problem in production environments having hundreds of servers. The container orchestration server creates a cluster out of these servers and keeps monitoring the resource usage and health status which are some of the many parameters that it looks at and help you schedule, deploy, scale and manage the containerized application deployment across the datacenter. The primary ones in the market are Swarm from Docker, Mesos from Apache and Kubernetes from Google which will be our point of focus in this post.
The best one seems to be Kubernetes from Google. It has undergone all the stress test that Google themselves had gone through in their process of managing their large scale datacentres. Being one of the first open source projects to graduate from CNCF, its adoption has been amazing ,attracting some of the biggest names in the industry. While covering the entire Kubernetes might take days, we will try to break down the concepts in an easy to understand manner in this post.
Let us start with the architecture.
All the servers which run Kubernetes, can be part of a single cluster and the client will talk to this single cluster instead of the individual servers.
Kubernetes Cluster can consist of a single node or multiple nodes. From a process point of view, the cluster consists of a Kubernetes Master node which runs services like kube-apiserver, kube-controller-manager and kube-scheduler. The Kubernetes worker nodes or minions as they were called earlier runs services like kubelet, kube-proxy and container run-time.
API server (kube-apiserver) – CLI clients like kubectl, Kubernetes dashboard or any other tools like Redhat Openshift talk to the Kubernetes cluster through the API server.
Controller (kube-controller-manager) – The controller keeps track of all the objects running on the Kubernetes cluster and runs a continuous control loops to manage the state of those objects. If it detects any changes in the cluster, it works to make sure that the desired state of the particular object is always maintained. We can think of this as a fault tolerance mechanism that Kubernetes employs. Controllers include ReplicaSet, Deployment, StatefulSet, DaemonSet and Job. We will take a detailed look at these later.
Scheduler (kube-scheduler) – It is responsible for the pod/container placement. This is done based on the constant monitoring of resource usage across the cluster. It also takes scheduling restrictions into account such as anti-affinity/affinity rules and also any limits that are configured. If the default scheduler does not satisfy your needs, you can create your own schedulers and direct the pods to be scheduled by this newly created scheduler.
etcd – It is a key value store used by Kubernetes to store all cluster data. All services, networks and other configuration data is stored in the etcd. In other words, it stores the state of the cluster. Hence if you have backed up the etcd, the cluster can be restored in case of any failures. Multiple etcd can be run for high availability. It can also be run outside of the Kubernetes cluster to provide additional stability.
Kubelet – Kubelet runs a process on the worker nodes. It talks to the API server on the master. It makes sure that the containers and pods are always healthy and running.
Kube-proxy – Not exactly a designated proxy any more. Kube-proxy now takes care of all the networking aspects in Kubernetes. When a kubernetes service is created , kube-proxy will update the Iptables on the node such that the service IP requests are redirected to the appropriate pods. It also does port mapping, port forwarding and routing tables on the node.
In the next post, we will try to cover more concepts from Kubernetes.