profile_document

Nexedi does not use LXC containers or docker because it is impossible to ensure their stable operation especially in case of kernel ABI mismatch or missing containerisation of system calls. Instead, we use SlapOS nano-containers which have provided since 2010 similar benefits in a more stable way while using less resources. SlapOS nano-containers will soon be deployed on embedded device using Elbe embedded Linux with over-the-air (OTA) upgrade and secure boot.

Last Update:2020-04-14
Version:002
Language:en

Page Content

(first version published in 2018, updated in 2020 with the idea of initrd)

Docker is a way to deploy so-called Linux containers. Docker's Linux containers consist actually of chroot filesystem with multiple layers. Linux containers use so-called kernel namespaces to create an imperfect form of isolation (ie. it does not isolate everything and has security holes) and additional resource control (ie. it does no control all resources). They provide significant improvement compared to standard chroot.

Linux containers (including Docker) can work reliably under one condition: the host operating system should be the same as the guest operating system. If this condition is not met, there is no way to ensure that a Linux container will be reliable.

So, if the host operating system is Debian 9, then the guest container should also be Debian 9 with same kernel version, same kernel options and same Debian 9 version. If the host operating system is Ubuntu 18.04, then the guest should ne Ubuntu 18.04.

If one tries to deploy a different host and guest operating system (Debian 9 guest on Ubuntu 16.04 host), there is a significant probability that some kernel ABI required by the guest will not be implemented by the host. The guest system process will then crash or behave in an abnormal way.

Besides this problem, which is a general issue of binary compatibility in the Linux ecosystem and is not specific to containers, one should make sure that all system calls used by binaries executed in the guest container are actually already containerised by the host kernel. Some years ago, simple commands such as "free -h" would return negative and obviously meaningless value when executed inside a Linux container. The exact list of system calls that are containerised is constantly evolving and depends on the kernel version.

There are also - obviously - some bugs in the implementation of Linux kernel namespaces and containers, just as with any software. Those bugs are hard to identify and fix from inside the guest. The quality of container code is improving quickly, but is still less stable and mature than a plain Linux kernel and flat system.

For all these reasons, Nexedi has decided not to use Docker or containers in production: we have simply no way to ensure without spending considerable effort that the result will be stable (until some kind of binary analysis utility can be applied to guest executable files and ensure the absence of ABI mismatch or lack of containerisation support of some system calls). We also do not have enough Linux kernel expertise to fix namespace related containerisation bugs, even though Nexedi has been contributing already to Linux kernel. We might use Linux kernel namespaces in very rare cases though, as long a we have full control on them (ex. to simulate complex networks and test them automatically). But we tend to use namespaces without containers, so that any kernel or low-level bug will be easier to fix.

SlapOS nano-containers

. SlapOS Nano-Containers Photo: unsplash.com - chuttersnap

Rather than LXC or Docker, Nexedi uses instead SlapOS "profiles and instances", which one could view as nano-containers. A description of SlapOS can be found in:

One key idea in SlapOS is to recompile all dependencies based on a given combination of glibc, kernel and architecture (ex. Xeon E5-2630), then cache source code as well as compiled binaries in a self-contained way. By doing so, we can ensure that each binary made with SlapOS is perfectly compatible and optimised for each architecture and host operating system. We can also ensure that any software built by SlapOS can be rebuilt in the exact same way in 10 years or longer.

The whole build process is entirely automated. Provisioning, configuration, orchestration, monitoring, accounting, billing and disaster recovery are also entirely automated.

SlapOS is optimised to use as few resources as possible (see "SlapOS rationale"). A single bare metal server can run hundreds of independent database server processes. Each new instance of a service SlapOS can consume just a few kilobytes.

Thanks to SlapOS, Nexedi has been able in the last 10 years to deploy the same code base on a wide variety of architectures and on very different operating systems: SuSE Linux Entreprise System, Red Hat Entreprise Linux, Debian, CentOS, Fedora, Ubuntu, etc.

Debian Secure Boot

In the future, Nexedi plans to create a Debian derivative with a secure boot and a read-only system image. Our goal is to be able to quickly deploy an embedded operating system that can be remotely upgrade over the air (OTA) with a predictable behaviour.

A tentative roadmap can be found here: https://www.nexedi.com/NXD-NayuOS.Roadmap.Elbe.2018. The same concepts should apply to servers and laptops.

We expect in particular to eliminate the risk of persistent malware that exists in traditional package based distributions by booting the entire OS from a signed read-only image. We also expect to reduce the risk of failed upgrade in this way, by always keeping a fallback system image ready to boot.

We may chose Ebe RFS to reduce our maintenance workload compared to traditional embedded system approaches based on Yocto or buildroot. Or simply use a plain Debian with a custome initrd.

Most Powerful Open Source ERP

Are Linux containers stable enough for production and why Nexedi uses SlapOS nano-containers instead ?

SlapOS nano-containers

Debian Secure Boot