Technical Advisory: containerd – containerd-shim API Exposed to Host Network Containers (CVE-2020-15257)
Vendor: containerd Project Vendor URL: https://containerd.io/ Versions affected: 1.3.x, 1.2.x, 1.4.x, others likely Systems Affected: Linux Author: Jeff Dileo CVE Identifier: CVE-2020-15257 Advisory URL: https://github.com/containerd/containerd/security/advisories/GHSA-36xw-fx78-c5r4 Risk: High (full root container escape for a common container configuration)
containerd is a container runtime underpinning Docker and common Kubernetes configurations. It handles abstractions related to containerization and provides APIs to manage container lifecycles. containerd-shim is a binary spawned by containerd that serves as the parent of a container and which implements container lifecycle and reconnection logic that it exposes to containerd through the containerd shim API. This API is exposed over an abstract namespace Unix domain socket that is accessible from the root network namespace. Due to this, non-user namespaced containers with host networking can access this API and cause containerd-shim to perform dangerous actions and spin up arbitrarily privileged containers, enabling container escapes and escalation to full root privileges on the host.
- containerd/ttrpc (via vendor/github.com/containerd/ttrpc/unixcreds_linux.go)
An attacker that is able to run or compromise a host network container running as UID 0 can escape the container, escalate privileges, and compromise the host.
containerd is a core container runtime, which manages runc-based containers, and is used by Docker (from which it was spun out of) and Kubernetes, either through Docker or directly through the containerd CRI shim. Generally, containerd exists as a long-running service daemon that exposes gRPC APIs (e.g. those for containers and tasks) for container lifecycle management operations (e.g. container execution and supervision, image handling, etc.). To implement its APIs, containerd does not directly parent the containers that it creates and oversees on behalf of its clients. Instead, containerd spawns containerd-shim processes that manage the lifecycle of each container. containerd-shim stays alive for the course of the container’s life to manage it and directly invokes the runc binary to directly spawn and run the container itself.
To serve its own gRPC (actually
ttrpc, an embedded gRPC implementation and
wire protocol) APIs (e.g. v1 and v2), containerd-shim listens on an abstract Unix
domain socket. These are Linux-specific Unix domain sockets that use
length-prefixed keys that begin with a null byte and may contain arbitrary
binary sequences. These containerd-shim sockets take different forms across
different containerd versions; however, a common behavior is that they embed a
trailing null byte in the abstract Unix domain socket sun_path key, which
prevents a number of common Unix tools (e.g. socat) from connecting to it.
While containerd-shim is more than capable of binding and listening on such a
socket itself when passed the
--socket CLI flag, it also supports receiving
an arbitrary socket file descriptor from its parent process. containerd uses
this approach and pre-creates and listen(2)s on the abstract Unix domain socket
before the containerd-shim child process is created to that it may be
initialized with a handle to it. containerd-shim then starts its containerd
shim API ttrpc server on the socket. As abstract Unix domain sockets are
otherwise permissionless, containerd-shim uses standard Unix domain socket
features to validate that incoming connections have the same UID and EUID
(effective UID) as the containerd-shim process itself (typically UID:0 and
However, unlike normal Unix domain sockets, which are bound to file paths,
abstract Unix domain sockets are tied to the network namespace of a process.
As a result, containers that use host networking
docker run --host network alpine ...) will be able to access it.
Furthermore, while most containerization platforms run their containers with
a minimal set of Linux capabilities (the constituent privileges of root), they
also do not run the containers in user namespaces, resulting in containers
that run as a privileged dropped root user. Due to this, such containers run
by default with a host user namespace UID and EUID of 0. This combination
enables such containers to enumerate containerd-shim sockets (e.g. via
netstat -xl or /proc/net/unix) and successfully connect to them.
containerd-shim exposes a number of dangerous APIs that can be used to escape a container and execute privileged commands. Across the two main versions of containerd(-shim) in use, 1.2.x and 1.3.x, the following exploit primitives are exposed to users, among others:
- Arbitrary file reads
- Arbitrary file appends
- Arbitrary file writes
- Arbitrary command execution in the context of containerd-shim (root)
- Creating a container from a runc config.json file
- Starting a created container
As a result, it is trivial for an attacker to compromise the host if they can reach the containerd shim API.
Abstract namespace Unix domain sockets should not be used to communicate with containerd-shim. Instead, the connection should be performed over unnamed Unix domain sockets created with socketpair(2), or Unix domain sockets bound to a file path, like /run/containerd/containerd.sock and /run/containerd/containerd.sock.ttrpc. If this is not feasible, stricter access control checks would need to be performed to validate incoming shim API clients, and it may be necessary to modify the connection handshake to provide additional authentication data and/or identification. It should be noted that it is insufficient to check that the connecting process is not a child of containerd-shim itself as the process could still connect to the shim API of a different container’s containerd-shim.
For users running container workloads on vulnerable systems, this issue may be mitigated by disallowing host networking from any containers that are not user namespaced, or by ensuring that such containers are run with a non-zero UID/GID.
Users should update to the newest versions of containerd that include patches for this issue. Additionally, as any running containers created prior to updating containerd to a fixed version will remain vulnerable after the update, users will need to ensure that all containers are fully stopped and then restarted after the update is completed.
For users who are uncertain about whether CVE-2020-15257 affects them, the below command can be used to quickly determine if a container created by a vulnerable version of containerd is still running. If any results are returned, a vulnerable containerd-shim process is running.
$ cat /proc/net/unix | grep 'containerd-shim' | grep '@'
6/03/20 - NCC Group emailed the security email of the containerd project (email@example.com) asking for a means of secure communication to disclose vulnerability information 6/03/20 - NCC Group disclosed vulnerability to the containerd project along with exploit code targeting containerd 1.2.x and 1.3.x 6/04-05/20 - After some initial conversation over email about possible remediations, communication migrated to GitHub. 6/05/20 - NCC Group discussed the (in)feasibility of relying on AppArmor/SELinux to remediate this issue. 6/12/20 - NCC Group requests an update. 6/15/20 - Issue is not accepted as a security vulnerability in containerd. The containerd project indicates that while a fix will be applied, it will not be backported to in-use branches. A sample patch is shared with NCC Group. 6/15-16/20 - Further replies and conversation occurred about the aforementioned patch's implementation and its incompatibility with prior versions of containerd. NCC Group provided information on an alternate approach that could work for all versions. 6/19-24/20 - Further development of a patch occurs by a containerd maintainer who requests and receives permission to make a public pull request. The implementation follows NCC Group's original recommendation and would be compatible across containerd versions. 7/10/20 - NCC Group requests an update and an estimate on when the fix will be merged and applied to older containerd branches. 7/13/20 - A containerd maintainer replies stating that the upcoming 1.4.0 release will forgo having the fix applied, and that instead, it will be be applied as a fix in 1.4.1 and to at least the 1.3.x branch. 9/04/20 - After a lack of updates, NCC Group states an intention to publish a technical advisory for this issue, and asks if anyone can confirm if the fix has been applied/backported as the standing pull request was commented as having been pushed to the future 1.5.x release. NCC Group also asks for a timeline on when the issue will be fixed and states that they can wait up to 30 days (10/05/20) or until a fix is released to publish the advisory since the issue was not accepted as a vulnerability. 9/10/20 - A containerd maintainer replies stating that the issue is still not fixed and that the pull request is not likely to be merged soon. They ask for reconsideration of the backwards-incompatible fix. 9/10/20 - NCC Group replies with concerns about the approach of the backwards-incompatible fix, including a timing side channel in the implementation that would enable guessing the authentication secret, and a bias in the PRNG used to create it. 10/02/20 - A maintainer replies with a potential fix based on verifying that the PID of the connecting process is on the host mount namespace. Immediately after this, a containerd security advisor asks if NCC Group still plans to publish a technical advisory on 10/05/20 and if they would be open to having a conversation about the issue. 10/02/20 - NCC Group replies raising a concern over a possible race condition in the underlying mechanism of potential fix. NCC Group also states that they can postpone publishing the advisory, and would be happy to converse about the issue if it would help to have it fixed. Over email, meeting availability is exchanged. 10/06/20 - NCC Group, a containerd security advisor and two containerd maintainers discuss the issue in a call and agree on a plan to remediate the issue as a vulnerability, with patches applied to supported branches of containerd. 10/06/20 -11/04/20 - The containerd project works on implementing the fixes across several supported protocol versions, backports the patches to the 1.4.x and 1.3.x branches. 10/16/20 - CVE-2020-15257 is issued for this vulnerability. 11/10-13/20 - NCC Group reviews and tests the patches, and provides feedback on the changes; no major issues are identified. Subsequent discussion resolves questions raised in the feedback. 11/13/20 - A follow-up call occurs to discuss disclosure timelines, patch releases, and embargo dates. 11/13-30/20 - Patches are provided under embargo to vendors and Linux distributions. 11/19-25/20 - A containerd security maintainer backports the patches to the end-of-life containerd 1.2.x for Linux distributions using that version. After discussion and analysis, a backport based on similar patches provided by Canonical and Google is selected for merging into the 1.2.x branch. 11/30/20 - containerd publishes a security advisory for this issue, CVE-2020-15257. 11/30/20 - NCC Group publishes this security advisory following the containerd publication.
Michael Crosby, Samuel Karp, and Derek McGowan of the containerd project.
About NCC Group
NCC Group is a global expert in cyber security and risk mitigation, working with businesses to protect their brand, value and reputation against the ever-evolving threat landscape.
With our knowledge, experience and global footprint, we are best placed to help businesses identify, assess, mitigate and respond to the risks they face.
We are passionate about making the Internet safer and revolutionizing the way in which organizations think about cybersecurity.