Free Open Source IT / OT system latencies

Page Content

Abstract

In this article we will describe how to create a fully open source and open hardware industrial automation system. We will use our OSIE (open source industrial edge) system and focus on the different measurements related to latency within the information system.

Architecture

Figure 1: OSIE system architecture.

The general OSIE architecture is client / server based. It consists of an edge server running the controlling software called Beremiz to which multiple couplers are connected over a TSN capable network. With each coupler we have a device called MOD-IO to which up to 4 relays can be connected in addition to multiple digital and analog IOs. The communication between the coupler and MOD-IO is based on the I2C protocol. Up to 127 MOD-IO can be connected to one coupler using UEXT in a daisy chain topology.

The communication between Beremiz and the couplers is based on OPC UA where Beremiz is running the control logic and the coupler runs a thin OPC UA server which simply executes commands.

Beremiz and the couplers are connected over a TSN network implementing the following IEEE 802.1Qbv and 802.1AS standards. The TSN network is required to provide deterministic behavior within the system.

Measurements

Our objective is to measure the 5 most important aspects of the system which in general are:

Beremiz control cycle latency
Keep-alive safety network latency
Raw UDP keep-alive safety network implementation
Linux kernels' PREEMPT_RT latency
Coupler to MOD-IO latency

Measurements (1) and (2) were run on a real system setup consisting of an edge server (x86 machine) connected over an Layer 2 (L2) un-managed switch to two couplers. All three machines were running Debian 11 patched with the PREEMP_RT patch. The response times were gathered by sending a high / low level signal from couplers on a specific GPIO port which were gathered by a connected logical analyser and processed inside Jupyter notebooks.

Measurement (3) is based on a POC written in Python and no logical analyser was used. It uses same setup and network topology.

For measurement (4) we used cyclictest.

For measurement (5) a logical analyser was used.

Figure 2: Real on field measurement setup

In Figure 2. one can see the measurement setup which consists of following parts:

An edge server (not visible on the image above) running Beremiz IDE/runtime and a virtual coupler (in master / publisher mode)
A "dumb" Layer 2 un-managed switched to which two couplers (stm32mp1-2 and stm32mp1-3) and the MOD-IO are connected
To each couplers' GPIO pins we attach a Salae logical analyser which helps us measure the signals with a high precision and frequency
Based on the type of measurements different tests are conducted

Measurement 1: Beremiz PLC control cycle latency

We wanted to measure what is the latency which we might expect when running a PLC program over described setup and network topology. For the purposes of the test we developed a simple PLC program which sets the state of an OPC-UA node on the coupler side at every PLC cycle. The mentioned state will go from 0 -> 1 and then back from 1 -> 0 at each PLC cycle. A trick we used here is that we ran the coupler thin client in a special measurement mode where each change of state is directly applied to a GPIO pin state on the coupler side. To this physical pin we attach the logical analyser with a very high precision and frequency that allows us to monitor the length of a PLC cycle in nano second precision.

Please note that Beremiz is running in a real-time mode and we use a default PLC cycle of 20 milliseconds. Below one can find the created diagrams and derived conclusions.

Link to the Jupyter notebook.

Link to Beremiz and coupler program.

Figure 3 & 4: PLC cycle duration diagrams.

Measurements (in milliseconds)
	Coupler 0	Coupler 1
Mean	20.00316	20.00316
Median	19.99182	19.99182
Min	12.08136	12.08136
Max	32.50304	32.50304
Standard deviation	0.78562	0.78562
Standard deviation (%)	3.92971	3.92971
Mode (most occurrences)	19.8326	19.8326

Conclusions

With a standard deviation of 3.93% which is consistent with the two couplers, we can say that we provide determinism in 96.07 % of cases.

Measurement 2: Keep-alive safety network latency

For this measurement we used the same trick and network / GPIO and Sale setup which one can see at Figure 1. but with two modifications:

1) a master coupler broadcasts UDP datagrams using open62541's OPC-UA PubSub implementation to two slave couplers. Our goal was to understand how reliable is PubSub on top of our own keep-alive system.

2) we measure the interval of time between receiving a PubSub datagram on the coupler side.

Figure 5: keep-alive system with one master and two slave couplers.

Again we run the coupler in a special mode where each received subscription from the master coupler is set to a dedicated GPIO pin (0 -> 1 and 1 -> 0). To this pin we connect the logical analyser which allows us to effectively measure the duration of each subscription cycle.

Note: We used the default OPC-UA's publish interval of 5 milliseconds.

Link to the Jupyter notebook.

Link to coupler program.

Figure 6 & 7: keep-alive publish cycle duration diagram.

	Coupler 0	Coupler 1
Mean	5.01265	5.03696
Median	5.2135	5.34774
Min	1.57323	1.76327
Max	36.26763	91.09253
Standard deviation	0.88046	1.05025
Standard deviation (%)	16.88809	19.63915
Mode (most occurrences)	5.09803	4.07077

Conclusions

As shown in the data, using this approach (standard deviation from 19%) does not deliver the results expected by us. One of the reasons is the implementation of OPC-UA PubSub which is intentionally restricted to a milliseconds range. Explanation and reasons for this can be found here. Unfortunately for us this means that we can not use this approach for a real keep-alive safety system.

Measurement 3: Raw UDP keep-alive safety network implementation

In this measurement a Python3 sender and receiver application was written running a master (x86 platform) and two slaves (couplers). Each sender sends datagrams (with current timestamp) over UPD which are received by the subscribers (couplers). The difference between datagrams is recorded and plot/stats are generated.

Link to the Jupyter notebook.

Figure 8 & 9:keep-alive publish cycle duration diagram using raw UDP implementation in Python.

	Coupler 0	Coupler 1
Mean	393.41921	394.61847
Median	391.0	393.0
Min	343.0	374.0
Max	469.0	446.0
Standard deviation	8.24558	7.76911
Standard deviation (%)	2.10885	1.97687
Mode (most occurrences)	390.0	391.0

Conclusions

With standard deviations ranging from 1.97 - 2.1 % this shows promising results. Compared to implementation of OPC-UA PubSub this approach also works in a micro seconds range compared to milliseconds. We do believe that these numbers can be improved even more by using low level C compared to a high level programming language such as Python where the role of GIL is likely to take its toll.

Measurement 4: Linux PREEMPT_RT latency

In this measurement we evalute the latency of a Linux Kernel patched with the PREEMPT_RT using cyclictest. We conducted the same tests on two different machines - a coupler and an edge server. We used the same kernel but we spent much more time trying to finetune the coupler compared to edge server.

Figure 10 & 11: Kernel latency.

	Coupler (Armv7)	tsn-shuttle (x86_64)
Kernel	Linux stm32mp1-3 5.15.49-rt47	Linux tsn-shuttle 5.15.49-rt47
isolcpus, rcu_nocbs	set	Not set
ETH IRQ affinity	set	Not set
ksoftirq priority	set	Not set

Conclusions

The results were quite stable and somehow expected. We know what we can do with the Linux Kernel, but there is still room for improvement when isolating processes running on the coupler side. Fortunately, the Linux kernel already has the necessary tools so that we can assign a real time process to a dedicated CPU and run the remaining maintenance tasks on remaining of CPU(s).

Measurement 5: Coupler to MOD-IO latency

In this measurement we measure the time which it takes for the relays of MOD-IO to react. This is the time from the moment when a signal to open a relay is send until the moment the relay is opened.

Conclusions

Our research shows that the time from signal to real relay open state is around 5.25ms. This time can be split into two main parts or challenges:

1) the connection between coupler and MOD-IO (see figure 1., the UEXT cable) is based on I2C protocol. The bandwidth of this protocol in high speed mode is only 3.4 Mbit/s which can be considered too slow

2) the time for the real physical relay itself to react and actually open

To solve this problem we need to consider using a new type of a board which can make use directly of existing GPIOs on the coupler thus eliminating problem (a). This board should also have a much faster relays mechanisms, too. A possible candidate is a solution based on solid state relays whose reaction time can be as low as 50 ns.

Overall conclusions and future directions

We can use Beremiz for applications in milliseconds range but can we use it at microseconds level?
As Beremiz is running in real-time mode using pthreads.h, it is worth investigating if a coupler application can benefit from this too
OPC-UA PubSub implementation is intentionally restricted to milliseconds range. Explanation is here.
Raw UDP seems promising and can deliver microseconds range.
We can fine tune the Linux Kernel if needed and it's NOT our bottle-neck (yet)
MOD-IO reaction time is 5+ms which is our current bottle-neck.

Most Powerful Open Source ERP

Free Open Source IT / OT system latencies

Abstract

Architecture

Measurements

Measurement 1: Beremiz PLC control cycle latency

Conclusions

Measurement 2: Keep-alive safety network latency

Conclusions

Measurement 3: Raw UDP keep-alive safety network implementation

Conclusions

Measurement 4: Linux PREEMPT_RT latency

Conclusions

Measurement 5: Coupler to MOD-IO latency

Conclusions

Overall conclusions and future directions

References