DockOne Technology Sharing (16): Talk about Kubernetes' main features and experience sharing

[Editor's Note] Introduces Kubernetes' main features and some experience. Take a look at Kubernetes 'ideas and basic architecture, and then briefly introduce Kubernetes' main features from the web, resource management, storage, service discovery, load balancing, high availability, rolling upgrade, security, and monitoring.

Let's take a look at some of Kubernetes 'ideas and basic architecture, and then briefly introduce Kubernetes' main features from the network, resource management, storage, service discovery, load balancing, high availability, rolling upgrade, security, and monitoring. The

Of course, will include some need to pay attention to the problem. The main purpose is to help you quickly understand the main functions of Kubernetes, the future in the study and use of this time with reference and help.

1.Kubernetes some of the ideas:

  • Users do not need to care about the number of machines needed, only need to care about the software (service) to run the required environment. To service as the center, you need to be concerned about the api, how to split the small service into a small service, how to use api to integrate them.
  • To ensure that the system is always in accordance with the user specified state to run.
  • Not only to give you for the container service, the same to provide a way to upgrade the software system; in the premise of maintaining the HA upgrade system is a lot of users most want the function, but also the most difficult to achieve.
  • Those who need to worry and do not need to worry about things.
  • Better support the concept of micro-service, division, sub-services between the boundaries, such as lablel, pod and other concepts introduced.

For Kubernetes architecture, you can refer to the official documentation .

Which is composed of kube-apiserver, kube-scheduler, kube-controller-manager, control component kubectl, state store etcd, kubelet, kube-proxy on slave nodes, and network support at the bottom You can use Flannel, OpenVSwitch, Weave, etc.).

Looks like a micro-service architecture design, but it is not well supported by the horizontal expansion of a single service, but this will be resolved in a future release of Kubernetes.

2.Kubernetes the main features

Will be from the network, service discovery, load balancing, resource management, high availability, storage, security, monitoring and other aspects of Kubernetes to these simple features -> Due to limited time, only a little simpler.

In addition, for service discovery, high availability and monitoring some of the more detailed introduction, interested friends can learn through this article .

1) network

Kubernetes' network approach addresses the following issues:

A. Tightly coupled communication between the containers, through the Pod and localhost access solution.
B. Pod communication between the establishment of communication subnets, such as tunnels, routing, Flannel, Open vSwitch, Weave.
C. Pod and Service, as well as external system and service communication, the introduction of Service solution.

Kubernetes' network assigns an IP address to each Pod, does not need to establish a link between the pods, and does not need to handle the port mapping between the container and the host.

Note: Pod reconstruction, IP will be re-allocated, so the network communication do not rely on Pod IP; through the Service environment variable or DNS solution.

2) Service discovery and load balancing

Kube-proxy and DNS, before v1, Service contains the fields of portalipipip and publicIPs, respectively, specify the service of the virtual ip and service export machine ip, publicIPs can be arbitrarily specified as a cluster containing any kube-proxy nodes can be multiple. The portalIp jumps to the container's intranet address via NAT. In v1, publicIPS is deprecated, marked as deprecatedPublicIPs, only for backwards compatibility, portalIp is changed to ClusterIp, and in the service port definition list, the nodePort entry is added, which corresponds to the service port mapped on the node.

DNS service to addon the way, need to install skydns and kube2dns. Kube2dns will be through the Kubernetes API to obtain the service clusterIP and port information, while the watch to check the changes in the service, timely collection of changes in information, and ip information will be submitted to the etcd archive, and skydns through etcd DNS record information , Open 53 port to provide external services. The approximate DNS domain name record is servicename.namespace.tenx.domain, and "tenx.domain" is the primary domain name that was set in advance.

Note: kube-proxy in the cluster after a large scale, there may be access to the performance problems, you can consider other ways to replace, such as HAProxy, direct flow to Service's endpints or Pods. Kubernetes official also fixes this problem.

3) resource management

There are three levels of resource restrictions, respectively, in the Container, Pod, Namespace level. Container level mainly use the support of the container itself, such as Docker CPU, memory, disk, network and other support; Pod can limit the system to create Pod resources, such as the largest or smallest CPU, memory requirements; Namespace level is the user Level of resources, including the CPU, memory, you can also limit the number of Pod, rc, service.

Resource management model – "simple, generic, accurate, and scalable

The current resource allocation calculation is relatively simple, there is no power to seize the power of such a strong, through the total amount of resources on each node, and has been used in a variety of resource weighting, to calculate a Pod priority with which nodes , Has not yet added to the actual assessment of the available resources of the node, need their own scheduler plugin to support. In fact, kubelet can already get the resources of the node, as long as the collection can be calculated, I believe Kubernetes follow-up version will be supported.

4) High availability

Mainly refers to the master node HA way official recommendation to use etcd master election, from a number of Master to get a kube-apiserver guarantee that at least one master available to achieve high availability. External to loadbalancer way to provide access. This way can be used as ha, but still not mature, it is understood that the future will update the function of upgrading ha.

A picture to help you understand:
That is, in the background of the etcd cluster, there are multiple kube-apiserver, and use the pod-master to ensure that only the master master is available. At the same time there are a number of kube-sheduller and kube-controller-manager, and with kube-apiserver at the same time can only have a set of running.

5) rolling upgrade

RC in the beginning of the design is to make rolling upgrade easier, through a replacement Pod to update the service, to achieve the service interrupt time to minimize. The basic idea is to create a new rc with a copy of 1 and gradually reduce the copy of the old rc, add a new copy of rc, and delete it when the old rc is 0.

Provided by kubectl, you can specify an updated mirror, replace the pod interval, or rollback the upgrade that is currently being performed.

Similarly, Kuberntes also supports multi-version deployment at the same time, and through the lable to distinguish, in the case of the same service, adjust the support service Pod, test, monitor the work of the new Pod.

6) storage

We all know that the container itself is generally not the data persistence processing, in Kubernetes, the container exception exit, kubelet is simply based on the original mirror to restart a new container. In addition, if we run multiple containers in the same pod, it is often necessary to share some data between these containers. Kuberenetes Volume is the main to solve the above two basic problems.

Docker also has the concept of Volume, but relatively simple, and the current support is very limited, Kubernetes Volume has a clear definition and extensive support. One of the most central concepts: Volume is just a directory and can be accessed by all containers in the same pod. And what will this directory, the back of what media and the contents of the use of the specific type of use determined by the type.

Create a Pod with Volume:
Spec.volumes Specifies the volume information required by this Pod spec.containers.volumeMounts Specifies which containers need to use this Volume Kubernetes is very supportive of Volume, with many contributors adding different storage support to it, as well as the active Kubernetes community degree.

  • EmptyDir removed with Pod for temporary storage, disaster recovery, shared runtime data, support for RAM-backed filesystem
    HostPath is similar to the Docker's local volume used to access some local resources (such as local Docker).
  • GcePersistentDisk GCE disk – Available only on the Google Cloud Engine platform.
  • AwsElasticBlockStore is similar to the GCE disk node that must be an instance of AWS EC2
    Nfs – Supports network file system.
  • Rbd – Rados Block Device – Ceph
  • Secret to pass the Kubernetes API to Pod to pass sensitive information using tmpfs (a RAM-backed filesystem)
  • PersistentVolumeClaim – Request resources from an abstract PV without having to care about the provider of storage
  • Glusterfs
  • Iscsi
  • GitRepo

According to their own needs to choose the appropriate storage type, anyway, support enough, always with a suitable 🙂

7) safe

Some main principles:

  1. The infrastructure module should exchange data through the API server, modify the system state, and only the API server can access the back-end storage (etcd).
  2. The user is divided into different roles: Developers / Project Admins / Administrators.
  3. Allows Developers to define the secrets object and associate it to the associated container when the pod starts.

Take the secret as an example, if the kubelet wants to pull the private image, then Kubernetes supports the following:

  1. Generate the .dockercfg file with the docker login for global authorization.
  2. By creating a user's secret object on each namespace, specify the imagePullSecrets property when creating the pod (or you can set it on serviceAcouunt).

Authentication (Authentication)
The API server supports three authentication methods, certificates, tokens, and basic information.

Through the apiserver security port, the authorization will be applied to all http requests
AlwaysDeny, AlwaysAllow, ABAC three modes, other requirements can implement their own Authorizer interface.

8) monitoring

Older versions of Kubernetes need an external cadvisor main function is to nest the host host image metrics. In the newer version, the cadvior functionality is integrated into the kubelet component, and the kubelet provides monitoring services while interacting with the docker.

Kubernetes cluster-wide monitoring is built primarily by kubelets, heapster and storage backend (such as influxdb). Heapster can get metrics and event data in a cluster-wide range. It can be run in the pod way in the k8s platform, you can also run alone standalone way.

Note: heapster is currently not version 1.0, for small-scale cluster monitoring more convenient. But for larger clusters, heapster's current cache approach will eat a lot of memory. Because the time to obtain the entire container of container information, the temporary storage of information in the memory becomes a problem, plus heaspter to support the api to obtain temporary metrics, if the heapster to pod way to run, it is prone to OOM. So it is recommended to turn off the cache and standalone way to stand alone k8s platform.

Q & A

Q: Does the kubelet itself run in pod?
A: You can run in the container, you can also run on the host, you can try hyperkube integration tools.

Q: What is the specific mechanism of roollback?
A: The feeling should be through lablel, and then one has to replace the pod has been upgraded, but not carefully studied.

Q: What is the difference between Mesos and Kubernetes? Feeling a lot of coincidence.
A: Mesos and Kubernetes focus on different, there are some coincidence place; mesos more good at resource management, support the upper framework, k8s native container design, more concerned about some of the issues related to the app.

Q: "For example, with HAProxy, direct diversion to the service endpoints or Pods", haproxy how to divert to the pod, podIP is not fixed?
A: You can use the watch etcd or api server way to monitor changes to update haproxy; kubeproxy use haproxy, only external loadbalancer way; if you want to replace, need to re-develop.

Q: Is there a distributed Volume solution that can be recommended? How do you use performance?
A: Distributed volume, you can try rbd, performance, then you need a lot of testing, constantly tuning; users mentioned in the use of moosefs to do storage, glushfs support a lot.

Q: k8s plugin specification? Or directly hard to change?
A: Some are still more standardized, you can plugin way; some also need to follow up the version of the adjustment, or to move the source code.

Q: k8s how to monitor docker events, such as: before the accident to quit, want to throw out some additional events, notify lb how to do?
A: Not sure what this is listening to the docker events, and then pod, rc level can watch.

Q: k8s how to set up the various pod dependencies and start order?
A: At present, there is no good mechanism to control the dependency and startup sequence of Pods to avoid these dependencies and order problems at the application level.

Q: What about the solution between the k8s cluster and the internal container? What are the good solutions for the performance of such programs like flannel?
A: At present flannel there are other alternatives, but flannel deployment more convenient, close to Kubernetes the overall mode of work; performance, if the online network, there will always be loss, the need to choose; a user to reflect, Huawei's test results Said ovs better than flannel, but they have not actually tested; flannel performance can see coreos official website blog, above the test report.

Q: first use the container to do lightweight virtual machine, the container can be accessed through hsotname, I do not know how to hands?
A: k8 on the network DNS (kube2dns + skydns) should be able to meet the needs, you can try.

Q: Is there a good monitoring tool?
A: You can refer to another article on DockOne.

The above content is based on the August 11, 2015 micro-credit group to share content. Shareers Wang Lei, is currently responsible for the speed of cloud containers as a service platform technology architecture, design and development management. Former IBM China Development Lab Senior Software Engineer, IBM BPM Product Development Team Team Lead. Participated in the development of middleware products such as IBM Lotus Domino Server, BPM, WebSphere Application Server, involved in the design and development of next-generation business process management engines, better suited to IBM Bluemix cloud environments, understanding. DockOne organizes targeted technology sharing every week, and welcomes interested students to add a letter: liyingjiesx, who is interested in listening to your topic.

    Heads up! This alert needs your attention, but it's not super important.