Resource Manager¶

Introduction¶

Industrial Edge apps may claim certain resources provided by the system. Resources can be hardware devices, external interfaces, software entities or the like. They are organized in resource classes, which define the type of a resource (e.g., a processor core, network interface, or a GPU). For each resource class, there can be multiple instances representing the actual devices, interfaces, etc. These instances are available for usage by Industrial Edge apps, and the mapping of instances to containers is done by the Resource Manager. This documentation provides information for app developers on how to specify resource claims.

Prerequisites¶

Using the Resource Manager requires an IEDK of at least version 1.16.0.

Overview¶

Each resource class is managed by a device plugin. This device plugin must be running on the system for the resource class to be available. The plugins automatically register with the Resource Manager and immediately provide claimable resources. When a plugin is not running, any attempt to allocate a resource of that type will fail (the Resource Manager will return an error and the app will fail to start).

Resource claims are specified in an app's Docker Compose file using extensions. Essentially, as an app developer, you only have to add an <service>:x-resources:limits:<resname>:<count> value in the Docker Compose file to claim count resources of type resname for the service service. The number count can be any natural number.

Note
Additionally, you have to add the entry runtime: iedge so that the extension field x-resources is handled correctly. If runtime: iedge is missing, the resource claim is silently ignored.

For example, here is a minimalistic docker-compose.yml, where the app claims one instance of resource class my_resource:

version: '2.4'
services:
  my_app:
    image: my_image
    runtime: iedge
    x-resources:
      limits:
        my_resource: 1

By default, all resource allocations are done exclusively. If one app has successfully claimed a resource, it is not available to any other app. Consequently, if all instances of a resource class are exhausted, an error is returned.

Note
If there are multiple resource claims in the same limits section, only the last one is considered, which is consistent with Docker's standard behavior.

Advanced Usage¶

In case there are multiple services in the Docker Compose file, you must add an entry <service>:environment:IEDGE_SERVICE_NAME:<service> for each service. Here is an example:

version: '2.4'
services:
  foo:
    image: my_image
    runtime: iedge
    environment:
      - IEDGE_SERVICE_NAME=foo
    x-resources:
      limits:
        resource_foo: 1
  bar:
    image: debian
    runtime: iedge
    environment:
      - IEDGE_SERVICE_NAME=bar
    x-resources:
      limits:
        resource_bar: 1

Note
Not every service needs the x-resources tag, but IEDGE_SERVICE_NAME must always be set if there is more than one service.

Docker Compose files utilizing Industrial Edge's Resource Manager can also be used stand-alone for development and testing. docker-compose up will work as expected, at least if the iedge container runtime is installed on the system and the Resource Manager is running in the background. Apart from an additional variable <service>:environment:IEDGE_COMPOSE_PATH:<path-of-compose-file> indicating the path to the Docker Compose file, no other modifications are required. This is not needed if an Industrial Edge app is built.

The above two environment variables are due to a current limitation of Docker Compose. The iedge runtime needs access to the x-resources section, which docker-compose does not pass to the OCI runtime.

CPU Isolation¶

Isolating processor cores is one of the "ingredients" for real-time applications. To ensure that an app runs exclusively on one or more cores, use the siemens.com/isolated_core resource class, for example:

x-resources:
  limits:
    siemens.com/isolated_core: 1

This way, no other app can run on the same core(s).

Note
The request specifies the number of isolated cores needed by an app, with the Resource Manager deciding which cores will be allocated.

The configuration passed to the container runtime contains an environment variable IEDGE_CPUSET_ISOLATED with a cpuset string specifying the isolated cores.

If your device provides the CPU isolation plugin, it is also prepared for executing real-time applications. This means that it ships with a real-time capable Linux kernel and comes with basic system tunings. Depending on your app's requirements, additional measures may be necessary that cannot or should not be dealt with by Industrial Edge as the underlying platform.

Network Interface Isolation¶

Unlike for CPU isolation, where the allocation of cores is solely done by the Resource Manager, claiming network interfaces is configurable. This way, an app can rely on obtaining always the same (physical) interface. The matching of interfaces with apps is accomplished using labels. As an Industrial Edge administrator, you can specify which interfaces shall be isolatable and attach one or more labels to them (optionally with a VLAN tag). However, when marking interfaces as isolatable, the administrator is limited to those interfaces the device builder allows to isolate.

Isolated network interfaces can be requested in docker-compose.yml, for example:

services:
  my_service:
    ...
    networks:
      - my_isolated_network

networks:
  my_isolated_network:
    driver: iedge
    driver_opts:
      label: foobar
      prefix: rt
    ipam:
      driver: "null"

The driver option prefix specifies the prefix of the network interface name inside the container. The driver option label can be used to filter isolatable network interface candidates based on the given label.

Note
Network interface isolation uses a different syntax compared to the conventional resource claiming syntax. Trying to isolate network interfaces using the conventional way will fail.

The Resource Manager cannot provide information about the isolated network interface to the application due to technical limitations. Instead, the application has to rely on the prefix specified in the driver options in order to determine the isolated network interface name inside the container (e.g., rt0). The Docker network plugin ensures that an optionally configured VLAN tag is used when communicating across the given network. The ipam subsection ensures that the network interface is handed over to the container without any IP configuration. IP configuration will also be reset when the network is deleted before the interface moves back into the host's network namespace.

PTP Device Support¶

The Precision Time Protocol (PTP) is a protocol to synchronize clocks between devices. Some Network Interface Cards have hardware support for this protocol to achieve accurate time synchronization over the network. The device on the Network Interface Card which enables this hardware support is referred to as "PTP Device". If the Network Interface Card of an isolated network has a PTP device, then the PTP device is also mounted to the requesting Docker container as a read-only device.

The application inside a Docker container knows about the PTP device and other network-related information by reading the following environment variables:

IEDGE_NETWORKS_PREFIXES: The prefix given in the driver_opts section of the iedge network in docker-compose.yml. If none is given, the default is eth.
IEDGE_NETWORKS_LABELS: The label given in the driver_opts section of the iedge network in docker-compose.yml.
IEDGE_NETWORKS_PTP_DEVICES: The path of the PTP devices mounted to the Docker container.
IEDGE_NETWORKS_ANNOTATIONS: A JSON array containing all the labels and their VLAN tag of the isolated network. This information can be configured in the "Resource Manager" tab of the IED settings where NICs are assigned to labels and VLANs.
IEDGE_NETWORKS_ISOLATED: The host name of the isolated and renamed network. Networks are renamed according to the prefix given in docker-compose.yml with an additional counter. If a container claims two isolated networks, both with prefix rt, one network will be called rt0 and the other network will be called rt1 inside the container.

The environment variable order is not deterministic, but the columns of the environment variables are always matching. Let us consider a concrete example. Assume a docker-compose.yml that claims two networks, foo and bar:

foo is called eno3 on the host, its PTP device is at /dev/ptp1, and its VLAN tag is 123.
bar is called eno2 on the host and its PTP device is at /dev/ptp0.

Then, both of the following variable assignments are possible:

IEDGE_NETWORKS_PREFIXES=foo,bar
IEDGE_NETWORKS_PTP_DEVICES=/dev/ptp1,/dev/ptp0
IEDGE_NETWORKS_ISOLATED=eno3,eno2
IEDGE_NETWORKS_ANNOTATIONS=[{"foo":123},{"bar":0}]
IEDGE_NETWORKS_LABELS=foo,bar

IEDGE_NETWORKS_PREFIXES=bar,foo
IEDGE_NETWORKS_PTP_DEVICES=/dev/ptp0,/dev/ptp1
IEDGE_NETWORKS_ISOLATED=eno2,eno3
IEDGE_NETWORKS_ANNOTATIONS=[{"bar":0},{"foo":123}]
IEDGE_NETWORKS_LABELS=bar,foo

As one can see, the order has changed, but the columns and their values are still matching.

Note
A single Network Interface Card can have multiple labels with multiple different VLAN tags. This is reflected in the individual JSON array elements of the IEDGE_NETWORKS_ANNOTATIONS environment variable, where the keys are the labels and the values are their respective VLAN tag. The JSON array length remains unchanged and reflects the number of isolated networks such that the columns can still be matched. Given above configuration of eno2 with label bar and no VLAN tag and assume eno3 has label foo with VLAN tag 123 and label foobar with VLAN tag 42. Then, the value of IEDGE_NETWORKS_ANNOTATIONS could either be [{"foo":123,"foobar":42},{"bar":0}] or [{"bar":0},{"foo":123,"foobar":42}].

In all cases, the IEDGE_NETWORKS_* environment variables are matching: If, for example, only the second (out of two) isolated networks has a PTP Device associated, the value of IEDGE_NETWORKS_PTP_DEVICES= will be ,/dev/ptp0 or /dev/ptp0,, respectively. In case of three isolated networks, but none with a PTP device, the value of IEDGE_NETWORKS_PTP_DEVICES will be ,,.

Real-time Applications¶

As mentioned in the section on CPU isolation, executing real-time applications requires several measures to be taken. Some of them are taken care of by the platform, i.e., via device builders or Industrial Edge itself, whereas others are in the responsibility of the app developers. As a guidance for app development, we summarize these measures in the following.

Measure	Significance	Platform	App Developers
Real-time capable Linux kernel	required	Device builders install kernel with PREEMPT-RT patch.	No action required.
Disable hyper-threading	recommended	Handled by Resource Manager.	No action required.
CPU isolation (user space)	required	Device builders configure a partitioning into housekeeping and isolatable cores and Resource Manager (CPU isolation plugin) allocates them on demand.	Specify desired number of isolated cores in Docker Compose file.
CPU isolation (kernel threads)	required	Device builders move kernel threads to housekeeping cores as far as possible (typically via TuneD).	No action required.
CPU isolation (RCU offloading)	optional	Device builders set `rcu_nocbs=` on the kernel cmdline (NOTE: this feature also needs to be enabled at compile time).	No action required.
CPU isolation (IRQs)	required	Device builders move interrupts to housekeeping cores (typically via TuneD).	No action required.
Scheduling policy and priority	required	Device builders set the priority of kernel threads (`ksoftirq`, `ktimersoftd`, `ktimers`, `rcuc`, `cpuhp`) to SCHED_FIFO 50, i.e., above the application.	Set the policy of relevant application threads to `SCHED_FIFO` and their priority to something below 50.
Real-time throttling	required	As a last resort, device builders leave kernel real-time throttling (via `/proc/sys/kernel/sched_rt_*_us`) active (as default behavior) to protect against starvation. If this kicks in (because the application hogs the CPU), the application will experience latency impact.	Ensure that a portion (e.g., 5%) of the CPU time is left to the kernel for housekeeping tasks.
Memory locking	recommended	None	Use `mlock` to prevent paging.
C-States	optional	If `/dev/cpu_dma_latency` is available, the Resource Manager passes it into the container.	Set desired C-state via `/dev/cpu_dma_latency` (if available).

Graphics Processing Units (GPUs)¶

Industrial Edge apps may utilize GPUs for compute-intensive applications, e.g., rendering, machine learning, or complex simulations. At present, only Nvidia GPUs are supported. Allocating a GPU is straightforward and follows the standard schema.

For example, to request one Nvidia GPU, add the following resource claim to our app manifest:

x-resources:
  limits:
      nvidia.com/gpu: 1

Note
The GPU's device drivers must be installed on the host system and match the kernel version. Ask your device builder for support in case the drivers are not available or unsuited.

If you are using a container provided by Nvidia, all the necessary libraries should be ready to use. When creating a custom container from scratch, be sure to set the following environment variables in the Docker compose file:

ENV CUDADIR /usr/local/cuda
ENV PATH ${CUDADIR}/bin:${PATH}
ENV LD_LIBRARY_PATH /usr/local/nvidia/lib64:${CUDADIR}/lib64:${LD_LIBRARY_PATH}

Of course, you can install additional libraries, e.g., for machine learning, in your container.

Optional Resources¶

The Resource Manager supports resource preferences, which means that an app should be started even if there is no instance of the requested resource class available. Such optional resource claims allow for fallback, e.g., from GPU to CPU as supported by many machine learning frameworks such as TensorFlow. There is no need to provide an app in different versions (with or without GPU). Apps can specify optional resources by adding the field optional in the resource definition part of docker-compose.yml.

Examples for the docker-compose.yml file:

# Isolation of CPU core(s)
x-resources:
      definitions:
        my_cpu:
          type: siemens.com/isolated_core
          optional: true
      limits:
        my_cpu: 1

# Nvidia GPUs
x-resources:
      definitions:
        my_gpu:
          type: nvidia.com/gpu
          optional: true
      limits:
        my_gpu: 1

Developers can claim both mandatory and optional resources in one app. The semantics is defined as follows, where n denotes the number of available resources, m the number of claimed mandatory resources, and o the number of claimed optional resources:

If n >= m+o, the app will get m+o resouces.
If m <= n < m+o, the app will get m resources and the optional resource claim is skipped.
If n < m, the app will fail to start with an error indicating that there are not enough available resources.

# Isolation of both mandatory and optional resources
x-resources:
      definitions:
        my_cpu:
          type: siemens.com/isolated_core
          optional: true
      limits:
        my_cpu: 1
        siemens.com/isolated_core: 2

For network interfaces, the optional keyword is part of the driver_opts section:

# Network interfaces
services:
  my_service:
    ...
    networks:
      - my_isolated_network

networks:
  my_isolated_network:
    driver: iedge
    driver_opts:
      label: foobar
      prefix: rt
      optional: "true"
    ipam:
      driver: "null"

Note
The default value is false if no optional field is specified. Unlike for ordinary resource classes like CPUs and GPUs, the type of optional in network resource definitions is String, i.e., it must be optional: "true" or optional: "false". The reason is that Docker Compose does not allow Boolean values in the driver_opt section.