Resource Manager¶
Introduction¶
Industrial Edge apps may claim certain resources provided by the system. Resources can be hardware devices, external interfaces, software entities or the like. They are organized in resource classes, which define the type of a resource (e.g., a processor core, network interface, or a GPU). For each resource class, there can be multiple instances representing the actual devices, interfaces, etc. These instances are available for usage by Industrial Edge apps, and the mapping of instances to containers is done by the Resource Manager. This documentation provides information for app developers on how to specify resource claims.
Prerequisites¶
Using the Resource Manager requires an IEDK of at least version 1.16.0.
Overview¶
Each resource class is managed by a device plugin. This device plugin must be running on the system for the resource class to be available. The plugins automatically register with the Resource Manager and immediately provide claimable resources. When a plugin is not running, any attempt to allocate a resource of that type will fail (the Resource Manager will return an error and the app will fail to start).
Resource claims are specified in an app's Docker Compose file using extensions.
Essentially, as an app developer, you only have to add an <service>:x-resources:limits:<resname>:<count> value in the Docker Compose file to claim count resources of type resname for the service service.
The number count can be any natural number.
Note Additionally, you have to add the entry
runtime: iedgeso that the extension fieldx-resourcesis handled correctly. Ifruntime: iedgeis missing, the resource claim is silently ignored.
For example, here is a minimalistic docker-compose.yml, where the app claims one instance of resource class my_resource:
version: '2.4'
services:
my_app:
image: my_image
runtime: iedge
x-resources:
limits:
my_resource: 1
By default, all resource allocations are done exclusively. If one app has successfully claimed a resource, it is not available to any other app. Consequently, if all instances of a resource class are exhausted, an error is returned.
Note If there are multiple resource claims in the same
limitssection, only the last one is considered, which is consistent with Docker's standard behavior.
Advanced Usage¶
In case there are multiple services in the Docker Compose file, you must add an entry <service>:environment:IEDGE_SERVICE_NAME:<service> for each service.
Here is an example:
version: '2.4'
services:
foo:
image: my_image
runtime: iedge
environment:
- IEDGE_SERVICE_NAME=foo
x-resources:
limits:
resource_foo: 1
bar:
image: debian
runtime: iedge
environment:
- IEDGE_SERVICE_NAME=bar
x-resources:
limits:
resource_bar: 1
Note Not every service needs the
x-resourcestag, butIEDGE_SERVICE_NAMEmust always be set if there is more than one service.
Docker Compose files utilizing Industrial Edge's Resource Manager can also be used stand-alone for development and testing.
docker-compose up will work as expected, at least if the iedge container runtime is installed on the system and the Resource Manager is running in the background.
Apart from an additional variable <service>:environment:IEDGE_COMPOSE_PATH:<path-of-compose-file> indicating the path to the Docker Compose file, no other modifications are required.
This is not needed if an Industrial Edge app is built.
The above two environment variables are due to a current limitation of Docker Compose.
The iedge runtime needs access to the x-resources section, which docker-compose does not pass to the OCI runtime.
CPU Isolation¶
Isolating processor cores is one of the "ingredients" for real-time applications.
To ensure that an app runs exclusively on one or more cores, use the siemens.com/isolated_core resource class, for example:
x-resources:
limits:
siemens.com/isolated_core: 1
This way, no other app can run on the same core(s).
Note The request specifies the number of isolated cores needed by an app, with the Resource Manager deciding which cores will be allocated.
The configuration passed to the container runtime contains an environment variable IEDGE_CPUSET_ISOLATED with a cpuset string specifying the isolated cores.
If your device provides the CPU isolation plugin, it is also prepared for executing real-time applications. This means that it ships with a real-time capable Linux kernel and comes with basic system tunings. Depending on your app's requirements, additional measures may be necessary that cannot or should not be dealt with by Industrial Edge as the underlying platform.
Network Interface Isolation¶
Unlike for CPU isolation, where the allocation of cores is solely done by the Resource Manager, claiming network interfaces is configurable. This way, an app can rely on obtaining always the same (physical) interface. The matching of interfaces with apps is accomplished using labels. As an Industrial Edge administrator, you can specify which interfaces shall be isolatable and attach one or more labels to them (optionally with a VLAN tag). However, when marking interfaces as isolatable, the administrator is limited to those interfaces the device builder allows to isolate.
Isolated network interfaces can be requested in docker-compose.yml, for example:
services:
my_service:
...
networks:
- my_isolated_network
networks:
my_isolated_network:
driver: iedge
driver_opts:
label: foobar
prefix: rt
ipam:
driver: "null"
The driver option prefix specifies the prefix of the network interface name inside the container.
The driver option label can be used to filter isolatable network interface candidates based on the given label.
Note Network interface isolation uses a different syntax compared to the conventional resource claiming syntax. Trying to isolate network interfaces using the conventional way will fail.
The Resource Manager cannot provide information about the isolated network interface to the application due to technical limitations.
Instead, the application has to rely on the prefix specified in the driver options in order to determine the isolated network interface name inside the container (e.g., rt0).
The Docker network plugin ensures that an optionally configured VLAN tag is used when communicating across the given network.
The ipam subsection ensures that the network interface is handed over to the container without any IP configuration.
IP configuration will also be reset when the network is deleted before the interface moves back into the host's network namespace.
PTP Device Support¶
The Precision Time Protocol (PTP) is a protocol to synchronize clocks between devices. Some Network Interface Cards have hardware support for this protocol to achieve accurate time synchronization over the network. The device on the Network Interface Card which enables this hardware support is referred to as "PTP Device". If the Network Interface Card of an isolated network has a PTP device, then the PTP device is also mounted to the requesting Docker container as a read-only device.
The application inside a Docker container knows about the PTP device and other network-related information by reading the following environment variables:
- IEDGE_NETWORKS_PREFIXES: The prefix given in the
driver_optssection of theiedgenetwork indocker-compose.yml. If none is given, the default iseth. - IEDGE_NETWORKS_LABELS: The label given in the
driver_optssection of theiedgenetwork indocker-compose.yml. - IEDGE_NETWORKS_PTP_DEVICES: The path of the PTP devices mounted to the Docker container.
- IEDGE_NETWORKS_ANNOTATIONS: A JSON array containing all the labels and their VLAN tag of the isolated network. This information can be configured in the "Resource Manager" tab of the IED settings where NICs are assigned to labels and VLANs.
- IEDGE_NETWORKS_ISOLATED: The host name of the isolated and renamed network. Networks are renamed according to the
prefixgiven indocker-compose.ymlwith an additional counter. If a container claims two isolated networks, both with prefixrt, one network will be calledrt0and the other network will be calledrt1inside the container.
The environment variable order is not deterministic, but the columns of the environment variables are always matching.
Let us consider a concrete example.
Assume a docker-compose.yml that claims two networks, foo and bar:
foois calledeno3on the host, its PTP device is at/dev/ptp1, and its VLAN tag is123.baris calledeno2on the host and its PTP device is at/dev/ptp0.
Then, both of the following variable assignments are possible:
IEDGE_NETWORKS_PREFIXES=foo,bar
IEDGE_NETWORKS_PTP_DEVICES=/dev/ptp1,/dev/ptp0
IEDGE_NETWORKS_ISOLATED=eno3,eno2
IEDGE_NETWORKS_ANNOTATIONS=[{"foo":123},{"bar":0}]
IEDGE_NETWORKS_LABELS=foo,bar
IEDGE_NETWORKS_PREFIXES=bar,foo
IEDGE_NETWORKS_PTP_DEVICES=/dev/ptp0,/dev/ptp1
IEDGE_NETWORKS_ISOLATED=eno2,eno3
IEDGE_NETWORKS_ANNOTATIONS=[{"bar":0},{"foo":123}]
IEDGE_NETWORKS_LABELS=bar,foo
As one can see, the order has changed, but the columns and their values are still matching.
Note A single Network Interface Card can have multiple labels with multiple different VLAN tags. This is reflected in the individual JSON array elements of the
IEDGE_NETWORKS_ANNOTATIONSenvironment variable, where the keys are the labels and the values are their respective VLAN tag. The JSON array length remains unchanged and reflects the number of isolated networks such that the columns can still be matched. Given above configuration ofeno2with labelbarand no VLAN tag and assumeeno3has labelfoowith VLAN tag123and labelfoobarwith VLAN tag42. Then, the value ofIEDGE_NETWORKS_ANNOTATIONScould either be[{"foo":123,"foobar":42},{"bar":0}]or[{"bar":0},{"foo":123,"foobar":42}].
In all cases, the IEDGE_NETWORKS_* environment variables are matching:
If, for example, only the second (out of two) isolated networks has a PTP Device associated, the value of IEDGE_NETWORKS_PTP_DEVICES= will be ,/dev/ptp0 or /dev/ptp0,, respectively.
In case of three isolated networks, but none with a PTP device, the value of IEDGE_NETWORKS_PTP_DEVICES will be ,,.
Real-time Applications¶
As mentioned in the section on CPU isolation, executing real-time applications requires several measures to be taken. Some of them are taken care of by the platform, i.e., via device builders or Industrial Edge itself, whereas others are in the responsibility of the app developers. As a guidance for app development, we summarize these measures in the following.
| Measure | Significance | Platform | App Developers |
|---|---|---|---|
| Real-time capable Linux kernel | required | Device builders install kernel with PREEMPT-RT patch. | No action required. |
| Disable hyper-threading | recommended | Handled by Resource Manager. | No action required. |
| CPU isolation (user space) | required | Device builders configure a partitioning into housekeeping and isolatable cores and Resource Manager (CPU isolation plugin) allocates them on demand. | Specify desired number of isolated cores in Docker Compose file. |
| CPU isolation (kernel threads) | required | Device builders move kernel threads to housekeeping cores as far as possible (typically via TuneD). | No action required. |
| CPU isolation (RCU offloading) | optional | Device builders set rcu_nocbs= on the kernel cmdline (NOTE: this feature also needs to be enabled at compile time). |
No action required. |
| CPU isolation (IRQs) | required | Device builders move interrupts to housekeeping cores (typically via TuneD). | No action required. |
| Scheduling policy and priority | required | Device builders set the priority of kernel threads (ksoftirq, ktimersoftd, ktimers, rcuc, cpuhp) to SCHED_FIFO 50, i.e., above the application. |
Set the policy of relevant application threads to SCHED_FIFO and their priority to something below 50. |
| Real-time throttling | required | As a last resort, device builders leave kernel real-time throttling (via /proc/sys/kernel/sched_rt_*_us) active (as default behavior) to protect against starvation. If this kicks in (because the application hogs the CPU), the application will experience latency impact. |
Ensure that a portion (e.g., 5%) of the CPU time is left to the kernel for housekeeping tasks. |
| Memory locking | recommended | None | Use mlock to prevent paging. |
| C-States | optional | If /dev/cpu_dma_latency is available, the Resource Manager passes it into the container. |
Set desired C-state via /dev/cpu_dma_latency (if available). |
Graphics Processing Units (GPUs)¶
Industrial Edge apps may utilize GPUs for compute-intensive applications, e.g., rendering, machine learning, or complex simulations. At present, only Nvidia GPUs are supported. Allocating a GPU is straightforward and follows the standard schema.
For example, to request one Nvidia GPU, add the following resource claim to our app manifest:
x-resources:
limits:
nvidia.com/gpu: 1
Note The GPU's device drivers must be installed on the host system and match the kernel version. Ask your device builder for support in case the drivers are not available or unsuited.
If you are using a container provided by Nvidia, all the necessary libraries should be ready to use. When creating a custom container from scratch, be sure to set the following environment variables in the Docker compose file:
ENV CUDADIR /usr/local/cuda
ENV PATH ${CUDADIR}/bin:${PATH}
ENV LD_LIBRARY_PATH /usr/local/nvidia/lib64:${CUDADIR}/lib64:${LD_LIBRARY_PATH}
Of course, you can install additional libraries, e.g., for machine learning, in your container.
Optional Resources¶
The Resource Manager supports resource preferences, which means that an app should be started even if there is no instance of the requested resource class available.
Such optional resource claims allow for fallback, e.g., from GPU to CPU as supported by many machine learning frameworks such as TensorFlow.
There is no need to provide an app in different versions (with or without GPU).
Apps can specify optional resources by adding the field optional in the resource definition part of docker-compose.yml.
Examples for the docker-compose.yml file:
# Isolation of CPU core(s)
x-resources:
definitions:
my_cpu:
type: siemens.com/isolated_core
optional: true
limits:
my_cpu: 1
# Nvidia GPUs
x-resources:
definitions:
my_gpu:
type: nvidia.com/gpu
optional: true
limits:
my_gpu: 1
Developers can claim both mandatory and optional resources in one app.
The semantics is defined as follows, where n denotes the number of available resources, m the number of claimed mandatory resources, and o the number of claimed optional resources:
- If
n >= m+o, the app will getm+oresouces. - If
m <= n < m+o, the app will getmresources and the optional resource claim is skipped. - If
n < m, the app will fail to start with an error indicating that there are not enough available resources.
# Isolation of both mandatory and optional resources
x-resources:
definitions:
my_cpu:
type: siemens.com/isolated_core
optional: true
limits:
my_cpu: 1
siemens.com/isolated_core: 2
For network interfaces, the optional keyword is part of the driver_opts section:
# Network interfaces
services:
my_service:
...
networks:
- my_isolated_network
networks:
my_isolated_network:
driver: iedge
driver_opts:
label: foobar
prefix: rt
optional: "true"
ipam:
driver: "null"
Note The default value is
falseif nooptionalfield is specified. Unlike for ordinary resource classes like CPUs and GPUs, the type ofoptionalin network resource definitions isString, i.e., it must beoptional: "true"oroptional: "false". The reason is that Docker Compose does not allow Boolean values in thedriver_optsection.