Unikernels are not suitable for production environments

Recently, I made a mistake, accidentally asked everyone if you want to talk about why unikernels are not suitable for the production environment . The results of everyone's response is very warm: Do you think unikernels direction is wrong and looking for support for this point of detail, or you are unikernel supporters and want to know what reason it is against it, we really want to understand against What is the reason why the production environment uses unikernel.

So, what is the problem with unikernel? Let's take a look at the definition: unikernel is an application that runs perfectly in the microprocessor's privileged mode. (The actual term may be slightly different; on x86 is called running under ring 0.) In other words, in the unikernel, there is no traditional sense of the application; on the contrary, the application function is put into the operating system kernel in. ("No OS" is misleading; it does not mean that there really is no operating system – but "all OS.") Before we discuss the problem, it's worth taking the time to first explore the original intention of unikernel Not just because it's small …

The main reason for implementing functionality in the operating system kernel is for performance: to avoid cross-user-kernel boundaries to do context switching, which can make operations that need to be switched across the boundaries faster. In the unikernel scene, this statement is very ambiguous: between the complexity of the modern platform running and the performance of modern microprocessors, usually, the application is not the main limited by the user – kernel context switch. This argument is further defeated by the fact that the unikernel relies heavily on hardware virtualization to achieve so-called multi-tenants. As I described in detail in the previous blog post , virtualization at the hardware level has a huge impact on performance: the system that can actually see the hardware (that is, the hypervisor) is isolated from the system (guest operating system) that actually sees the application, Such hardware utilization (such as DRAM, NIC, CPU, I / O) caused by the loss of efficiency, no matter how much effort can not be saved. However, the performance is too strong is not worth it; here I just think that unikernel performance advantage can be a strong refute.

Unikernel supporters of another reason is unikernel "more secure", but the fact that this argument is not clear. Yes, unikernel usually runs less software (so the attack surface is smaller) – but there is no unikernel criteria that clearly specify that the software will be less. Yes, unikernel usually runs new or different software (so there is no OpenSSL vulnerability ) but this ambiguous security argument applies to running any new, esoteric systems. This security perspective seems to be crossing the unikernel very dependent protection boundary: the protection boundary between the guest OS provided by the underlying hypervisor. The vulnerability of the hypervisor is certainly there; we can not think that the loophole in the Linux kernel is a silent threat, and that the vulnerability of the hypervisor does not exist. On the contrary, depriving the application developer of the user protection boundary, in fact, contrary to the principle of the least privilege : any loopholes in the application will be repeated from the unikernel. In a container-based deployment world, this can cause annoying problems – mysterious management – and more serious (at greater risk). The best case, unikernel is a security threat , and in the worst case, it is the source of a safe nightmare.

The last reason for the unikernel supporters is that they are small – but again note that nothing of the unikernel is not strictly small in the sense! For me personally, I have done in the small kernel and large kernel kernel implementation; you can really have a very compact system, do not need to use unikernel such things! (I am a super fan of Alpine Linux , which is a very streamlined user domain that supports Linux applications and / or Docker containers.) To say that unikernel does not have much code, I can only say it looks just because it's still Development early, but not because it is not too much code from the design code. However, just by code to measure the size of the unikernel is not correct, and here unikernel supporters ignore the details of large systems: because unikernel as guest operating system running, by the hypervisor assigned to the guest DRAM will be consumed as a whole – even if The application itself does not use DRAM. Because memory depletion is still one of the most damaging application failure modes (especially in dynamic environments), memory size is still designed to be overly sophisticated and complex in such a demand, often with no doubling or spillover. In the unikernel mode, any such spill actually means losing – anything else can no longer use it because the hypervisor does not know that the memory is actually used. (This is in sharp contrast to the container, and the memory that is not used by the application in the container can be used by other containers, or the system itself.) Again, when the overall system is taken into account, the unikernel's argument becomes less The same (if not completely opposed).

In summary, there are some reasons for choosing unikernel: possible for performance, a little security on the threat, and a list of software crashes. As these reasons are so scary, it is the end of the good news of unikernel. Down from here are bad news: the cost of getting these advantages is not a small cost, and this system is very fragile.

The first of the unikernel shortcomings is the mechanism of the application itself. When the operating system boundaries are removed, the interface of the application and the real world or the persistent storage interaction may also be removed at the same time – but we do not really need such an interface! Some unikernel (such as OSv and Rumprun ) program is to achieve a "similar POSIX" interface to minimize the destruction of the application. The good news: the application can work! Bad news: I mentioned that we need to migrate ? But the application is expected to be "similar to POSIX", will not extend the old way, such as creating a process : there is no process in the unikernel, so if your application depends on this (the prevalence of 40 years) Mechanism, you basically finished. (Or worse )

If this program is not so common, in language-specific unikernel, such as MirageOS , things are even worse, MirageOS depth embedded in a specific language runtime. On the one hand, it is only allowed to implement a reliability problem with a type-safe language that obscures unikernel's actual reliability. On the other hand, pray for everything you need in OCaml!

So if you want to make your application work may encounter these problems, but assuming that all of these problems have been solved: either for your application (or platform), unikernel exposed POSIX interface is sufficient, or has been used OCaml or Erlang or Haskell completed the preparation. Your application can run in the unikernel, then you will encounter the biggest problem of unikernel, which makes it completely unsuitable for the production environment – and when you want to deploy something in the actual production environment, this reason (at least for me ) Is unikernel fatal problem: unikernel completely unsuccessable. There is no process here, so of course there is no ps , no htop , no strace – but even no netstat , no tcpdump , no ping ! There are only some old old tools. Of course there is no DTrace or MDB . From the point of view of debugging, it is too polite to compare the original: it is not the Paleolithic – this is Cambrian. As a career are spent on the development of production environment systems and debugging tools for these systems, I can not deny the need for debugging production system, unikernel supporters of the serious dangers are: the lack of operation and maintenance of empathy. Production environment problems can be solved as long as they are easy to get together – restarted when the service is wrong. This attitude – even if only implies this idea – can irritate those who have been responsible for the operation and maintenance system. (You have to think that I am an outsider, listen to my speech on DockerCon 2015, I stressed the need to debug the system and not just restart after the cheers ). If you have to say that, this attitude is annoying because it is wrong: If a production application starts to fail because of a seemingly insignificant problem such as a listening failure , restarting the application may miss the worst case (That is, under high load), and thus can not analyze the root cause of the problem (not enough backlog).

So, someone can unikernel in the production environment to achieve the necessary debugging tools? Two words: No: Debugging tools generally cross the user-kernel boundary and are most efficient when using ad hoc queries from the command line. Organizations that provide this functionality have been deliberately removed from the unikernel in the name of weight loss; any unikernel that provides a sufficiently sophisticated debugging tool that can be used in a production environment will violate its own dogma. Unikernel is not suitable for the production environment not only because of its realization, but also because of its vision itself: in the production environment when the error is completely unable to understand the reasons for the error – even through their own assertions do not work, they will never be here Improved.

Here, I do find a common feature of unikernel supporters: I agree that container changes require a more streamlined, more secure, and more efficient runtime than the shared Linux guest OS running on virtual hardware – Joyent, our focus for the past few years is to use SmartOS and Triton to provide such a run time. At the same time I also see a common problem with unikernel supporters, our solution is essentially different from the unikernel: we gave up the idea of ​​running a safe container on a multi-tenant layer, but instead used a security zone that had been secured , And the ability to add native native Linux binaries to it . In other words, we choose to take advantage of the operating system rather than deny its existence, to Linux and Docker not only bring a safe container, but also bring more advantages, such as ZFS , Crossbow and DTrace. This is the last place to emphasize: our focus on the production environment is on everything we do, but it is especially reflected in the large number of tools available for debugging the production system – by bringing these tools to the Linux container, Triton Has supported our previous production environment that is not possible to debug !

At the right time, I think the unikernel is the most effective, but it leads to negative results: they will be used primarily to demonstrate unrealistic solutions for production systems. Therefore, they will join the transactional memory and MN scheduling model of the camp, as no future system software, will eventually die in the real test. But you do not have to listen to me: as my tweet says, the production system that can not be debugged is punishable by them – and just punish themselves without affecting us!

Original link: Unikernels are unfit for production (translation: Cui Jingwen)
Translator introduction Cui Jingwen, now working at IBM, senior software engineer, responsible for IBM WebSphere business process management software system testing work. Has worked for VMware in the quality assurance of desktop virtualization products. Has a strong interest in virtualization, middleware technology, business process management.

Heads up! This alert needs your attention, but it's not super important.