Understanding Cyber-Physical Systems Engineering: New Challenges and Research Directions, Summaries of Engineering

The evolution of embedded system engineering to cyber-physical systems engineering (CPS), focusing on the unique challenges and research areas in CPS. safety engineering, control theory, networked control, and the interaction between physics, computation, and communication. It also proposes potential research areas such as set-based system engineering, geometric analysis of dynamics, and paradigm shift in safety engineering.

Typology: Summaries

2021/2022

Uploaded on 09/12/2022

scream
scream 🇬🇧

4.5

(11)

273 documents

1 / 41

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CPS Engineering: Gap Analysis and Perspectives
Emmanuel LEDINOT
THALES Research & Technology France
Abstract
Virtualization of computing and networking, IT-OT convergence, cybersecurity and AI-based
enhancement of autonomy are significantly increasing the complexity of CPS and CPSoS. New
challenges have emerged to demonstrate that these systems are safe and secure. We emphasize the
role of control and emerging fields therein, like symbolic control or set-based fault-tolerant and
decentralized control, to address safety. We have chosen three open verification problems we deem
central in cost-effective development and certification of safety critical CPSoS. We review some
promising threads of research that could lead in the long term to a scalable and powerful verification
strategy. Its main components are set-based and invariant-based design, contracts, adversarial testing,
algorithmic geometry of dynamics, and probabilistic estimation derived from compositional massive
testing. To explore these orientations in collaborative projects, and to promote them in certification
arenas, we propose to continue and upgrade an open innovation drone-based use case that originated
from a collaborative research project in aeronautic certification reformation.
Introduction
From the 70s to the early 2010s, development of systems featuring real-time control of physics has
been named “embedded system engineering” or “automation”. The embedded systems, e.g. defence
systems, power plants, transportation vehicles, health care devices, etc., have reached over these
decades high levels of complexity often measured by the size of embedded software (Klocs, Mlocs,
etc.). Early on these embedded systems have featured networking aspects. First internally to connect
the sensors, actuators, micro-controllers and mission computers, then externally by means of wireless
communications. This outer-connectivity has recently blossomed with mobile devices and with the
Internet of Things, also named Machine2Machine or Internet of Everything.
Why then the emergence, by 2006-2010, of the Cyber Physical Systemappellation to denote systems
that seem identical to what has been developed so far in Operational Technologies (OT)? In other
words, is “CPS” some rebranding intended to cope with the latent perception that system engineering
in general, and embedded system engineering in particular, are mature fields that no longer deserve
to be in limelight?
We start by analyzing “Cyber” of “Cyber-Physical. It is not “Cyber” as meant by cyber of cyber-security,
though cybersecurity admittedly is of prominent importance in CPS engineering. We support the
opinion according to which some characteristics distinguish CPS from embedded systems and
automation. We review some of them with safety and behavioral complexity in mind. Then, we
propose three CPS engineering problems we think remain open for academic research and industrial
practice altogether. They have always been, and still are, open problems for embedded systems and
system engineering in general. However, with the consent of many, and occasionally huge, budget and
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29

Partial preview of the text

Download Understanding Cyber-Physical Systems Engineering: New Challenges and Research Directions and more Summaries Engineering in PDF only on Docsity!

CPS Engineering: Gap Analysis and Perspectives

Emmanuel LEDINOT THALES Research & Technology France

Abstract

Virtualization of computing and networking, IT-OT convergence, cybersecurity and AI-based enhancement of autonomy are significantly increasing the complexity of CPS and CPSoS. New challenges have emerged to demonstrate that these systems are safe and secure. We emphasize the role of control and emerging fields therein, like symbolic control or set-based fault-tolerant and decentralized control, to address safety. We have chosen three open verification problems we deem central in cost-effective development and certification of safety critical CPSoS. We review some promising threads of research that could lead in the long term to a scalable and powerful verification strategy. Its main components are set-based and invariant-based design, contracts, adversarial testing, algorithmic geometry of dynamics, and probabilistic estimation derived from compositional massive testing. To explore these orientations in collaborative projects, and to promote them in certification arenas, we propose to continue and upgrade an open innovation drone-based use case that originated from a collaborative research project in aeronautic certification reformation.

Introduction

From the 70s to the early 2010s, development of systems featuring real-time control of physics has been named “embedded system engineering” or “automation”. The embedded systems, e.g. defence systems, power plants, transportation vehicles, health care devices, etc., have reached over these decades high levels of complexity often measured by the size of embedded software (Klocs, Mlocs, etc.). Early on these embedded systems have featured networking aspects. First internally to connect the sensors, actuators, micro-controllers and mission computers, then externally by means of wireless communications. This outer-connectivity has recently blossomed with mobile devices and with the Internet of Things, also named Machine2Machine or Internet of Everything. Why then the emergence, by 200 6 - 2010 , of the “Cyber Physical System” appellation to denote systems that seem identical to what has been developed so far in Operational Technologies (OT)? In other words, is “CPS” some rebranding intended to cope with the latent perception that system engineering in general, and embedded system engineering in particular, are mature fields that no longer deserve to be in limelight? We start by analyzing “Cyber” of “Cyber-Physical”. It is not “Cyber” as meant by cyber of cyber-security, though cybersecurity admittedly is of prominent importance in CPS engineering. We support the opinion according to which some characteristics distinguish CPS from embedded systems and automation. We review some of them with safety and behavioral complexity in mind. Then, we propose three CPS engineering problems we think remain open for academic research and industrial practice altogether. They have always been, and still are, open problems for embedded systems and system engineering in general. However, with the consent of many, and occasionally huge, budget and

E. Ledinot schedule overruns, system engineering managed to cope with them, and in the end to deliver good quality products and services. Based on three long lasting problems we chose among many other ones, we argue that the lag of engineering w.r.t. to product and operation complexity is likely to worsen significantly with CPS, and even more so with CPSoS. In the third section, we present an open innovation CPSoS end-to-end development use case initiated in aviation certification research by 2015 - 2018. We propose to continue collaborative research on an extended and refactored version, in search for solutions to the three selected open problems. They all focus on behavioral specification, design and verification. Finally, we present three groups of research work we deem have some potential to fill these three pointed gaps: set-based system engineering, geometric analysis of dynamics, and paradigm shift in safety engineering.

1. What are Cyber-Physical Systems?

A significant number of expert groups have addressed this question worldwide, for instance [P4C18] and [EF20] in Europe, yet without getting to consensus. We contribute some perspectives, motivated by certifiable autonomy and IT-OT convergence one side, by some feeling of under estimation on the other side. In the late 40s, the mathematician Norbert Wiener perceived the prominent role of feedback to understand the behavior of natural and artificial systems. He coined the term Cybernetics [Wie48]. He derived it from the greek “Kubernêtiké”, whose meaning is “the art of governing”. Since then, “Control” has superseded “Cybernetics”. However, “Cyber” did not disappear whatsoever, as witnessed by “cyber-security”. The meaning of “cyber” seems to have shifted from “governing” to some synonymous of “digital”. In this paper, we promote the primal interpretation of “Cyber”, i.e. steering, governing, as the prominent feature of Cyber Physical Systems. This stance is helpful to understand why CPS engineering raises new challenges compared to embedded system engineering and IT system engineering. Control theory, fault-tolerant control, networked control, compositional control, symbolic control, and control-oriented safety are examples of the scientific and technological background we deem at the heart of safety critical CPS and CPSoS engineering. Digital real-time control has been key in the constant progress of embedded systems since the 80s. However, up to end of the 90s control engineering has been used on a physical domain per physical domain basis: mechanical control, electrical control, flight control etc. Multi-physics modeling and control (e.g. Modelica [Mod96]) has emerged recently, mainly necessitated by hybrid car design and enabled by Gbyte-RAM computers^1. CPS enhance the multi-physics multi-control trend. With CPS, multi-physics control, also named generalized control, becomes the norm. Most importantly, it no longer remains local , as with embedded systems. It scales up to the global : from stamp-size SoCs up to smart cities, power grids or satellite constellations.

1.1 The Control-Compute-Communicate trilogy

One often defines cyber physical systems as systems where “physics, computation and communication are in interaction”. We follow this symmetry-inclined line of thought. In the sequel, we subsume “Physics” by “Control”. Control has historically developed as the science of governing physical devices (e.g. J. Watt’s flyball governor). However, with software-based virtualization of some physical and hardware resources like networks and computers, control has also pervaded the digital. (^1) Needed for the sophisticated computer algebra transformations of Differential Algebraic Equation systems (DAEs) into numerical integration programs.

E. Ledinot We review the six interactions of the trilogy to underline where CPS and CPSoS depart from embedded systems. Computation for Control Sensing and signal processing have always been computation-intensive in embedded systems. Think of radars for instance. They have kept being so in spite of more than six decades of Moore law. Today the need for high performance computing (HPC) is still increasing, drawn as it is by the needs of AI- based machine vision (CNNs, DNNs^5 , etc.), a “must-have” for autonomous systems. Sensor redundancy (radars, LIDARs, cameras), sensor fusion, image and video processing, NN^6 training and generalization, all have boosted the need for HPC to support the sensing part of control. Autonomous vehicles (cars, trains, tramways, drones, underwater autonomous vehicles etc.) critically depend on visual sensing and vision-based control, both highly computation intensive. We regard autonomous vehicles as embedded systems when considered in isolation, and as CPS, CPSoS or SoCPS when connected and coordinated. The control scope and its life-cycle management make the difference, as proposed in 1.2. Control for Computation This notion seems to have appear with low power (e.g. mobile devices) and data-centers (orchestrated virtual machines). In both cases, control senses the available resources and the load profile, to allocate the resources and schedule the tasks in compliance with the QoS and resource footprint objectives. Operation research algorithms optimize resource allocation. Consequently, timing analysis of the executive layer, key for control stability, has become significantly harder, and safety demonstrations in turn. Connectivity for Computation Parallel and distributed computing architectures are ubiquitous in CPS for sensing, Big Data, or to support coordination of entities by means of wireless networks (LTE, 5G, …). Examples of connectivity- intensive computation infrastructures are manufacturing plants of industry 4.0 or command and control of transportation infrastructures (ATM/ATC, railway, …). Control and Computation for Connectivity Virtualization of the network resources (Software Defined Networks) has paralleled virtualization of the computing resources. It has enabled optimized sharing of the physical resources to support transparency of technological heterogeneity and to support massive transfer services with high bandwidth variability. SDN relies on feedback scheduling , i.e. continuous sensing of the physical and logical resources’ sates to adapt the configuration and the operating modes. Connectivity for Control A crane is a CPS and an embedded system^7. We regard a (4-crane)-crane, i.e. four coordinated cranes that lift altogether a load whose weight is beyond their respective capacity, as a CPS that no longer is an embedded system. Collaborative cranes give just an example of collaborative missions. The equivalent for defense systems, NCW^8 , appeared in the 2000s. Swarms of coordinated drones is another typical example. Data-links, e.g. ADS^9 - B, CPDLC in aviation, LTE, 5G, RFID, and all types of (^5) Convolutional Neural Networks, Deep Neural Networks (^6) Neural Networks (^7) We support the idea that the definition of CPS should extend that of embedded system, without excluding them from being (limit-case or “degenerated”) CPS. (^8) Network Centric Warfare (^9) Automatic Dependent Surveillance-Broadcast, Controller-Pilot Data Link Communications

5 CPS Engineering : Gap Analysis and Perspectives Submitted to LITES journal, Leibniz Transactions on Embedded Systems , special issue on Distributed Hybrid Systems- 2021 wireless communications have enabled design of collaborative systems that go far beyond mere interoperability, interconnection, remote monitoring, or remote maintenance. It has enabled collaborative control , whether centralized or decentralized. Thanks to generalized wireless connectivity, new physical hazards dependent on global control loops are being created at all scales of integration (tight-coupling) or interoperability (loose-coupling). We now review some CPS aspects where engineering is challenged. They motivate the three open problems and associated research tracks proposed in this paper.

1.1.1 New challenges for the safe and secure

In embedded system engineering, safety is mainly addressed as architectural mitigation of component failures, i.e. as fault-tolerance of reliability events. For CPS engineering, we deem necessary to address safety primarily as a controllability issue, controllability under disturbances and uncertainties [Lev 12 ]^10. Among the many disturbance sources, physically initiated component failures remain a major one. However, with software-intensiveness and connectivity-intensiveness, the flaws at system specification level, i.e. the risks of overlooked interaction failures at design time, tend to be a concern on par with that of component failures. Figure 3: pervasive control in IT-OT convergence – CPS/CPSoS: software intensive and connectivity intensive systems and systems of systems We observe with some concern the headlong rush towards generalized connectivity and interaction of mobile devices, machines, vehicles, infrastructures, and people. Remote monitoring of physical installations, predictive and remote maintenance are among the main business drivers of this race, amplified by the Big Data and AI-Machine Learning promises. However, even with state of the art engineering, can we trustworthily verify such CPS and CPSoS? What is behavioral verification coverage for distributed physics, systems, and systems of systems?

1.1.2 The logically separated over the physically shared

The promise of virtualization in OT is cost reduction, facilitation of deployment over massive numbers of end-points, and better adaptivity (“elastic resources”). However, virtualization has major drawbacks for safety assurance. First, it complicates substantiation of execution integrity. Second, sharing multiplies the single points of failures where traditional engineering cautiously resorted to physically (^10) N. Leveson uses the term « system-theoretic », while putting forth “control-loop structures”. We deem “control-theory” a theoretical setting more beneficial for safety than “system theory”. Both converge on the core idea of “state governance”.

7 CPS Engineering : Gap Analysis and Perspectives Submitted to LITES journal, Leibniz Transactions on Embedded Systems , special issue on Distributed Hybrid Systems- 2021 actuate exclusively digital resources. From a safety perspective, there should be no issue with So-CPS since the hazards are related to physical damage^13 , and the potential physical damages depend on a single component-system in spite of its communications with the other component-systems. Second, what could be the difference between CPS and CPS-oS? The “-oS” denotes limited capability of federating the component-systems in spite of existence physical-control loops that span over them. From safety assurance standpoint, this is potentially a dangerous situation. Contrary to the straightforward and common interpretation, we suggest that there should be no difference of scale between CPS and CPSoS^14. A smart city, or a power grid may be regarded as a CPS and not as a CPSoS as long as a prime contractor, or any institution playing the same federating role, has the ability to master design and component-wise life-cycles wherever safety critical physics is governed. Global inter-system control is likely to be of the “directed”, “acknowledged”, or “collaborative” types since safety-criticality is at stake. When there is too much independence between the development and lifecycle management of the component-systems, there is potentially a lack of behavioral consistency on the physical aspects. One may use CPS-oS instead of CPS to manifest weaker coherency enforcement on the components, safety-sensitive limitation on federating management. 1.3 Synopsis We have proposed control, primarily of the physical but also of the digital, as a key characteristic of CPS, and physical control scope as an indicator to discriminate between CPS-oS and So-CPS. Whatever the scale of integration or interoperability, when physics is in interaction through the component- systems, we propose to choose between CPS and CPS-oS according to the answer to the following question: at inter-system level, is control under control? What is at stake with the emerging digital society has been summarized in figure 4, that we have borrowed from [TCI21] and augmented slightly. We have supplemented the digital continuum with the governed digital-physical continuum , to manifest pervasive IT-OT control with all its shades of business, safety, and security criticality. We regard the informational-physical control continuum as a candidate central issue, if not concern, of the digital society underway. (^13) The case would be different if safety were extended to business damage (^14) The concept of system is recursive and scale-invariant

E. Ledinot Figure 4: A conceptual reformulation of figure 3, as an augmented version of the IT-centric view of [TCI2 0 ]. The continuum is Digital - Physical , IT- OT , the two components interacting through controlled or uncontrolled digital-physical influence networks. We have used the STPA graphical notation for control loops. Besides AI-Machine Learning, progress in distributed fault-tolerant and resilient control is needed to build a trustworthy digital society. Any accident is primarily a loss … of control.

2. Three Open Problems in CPS Engineering

We focus on three engineering problems we think are important to address the new IVVQ and safety engineering challenges that arise with CPS and CPSoS. They are all related to behavioral analysis. The raise of rework and “emergent properties” correlated to that of distribution and networked control motivates the first one. The lack of measures of behavioral verification coverage at system level is the second one. The third open problem is the widening fidelity gap between safety assessment models and system engineering models. AI-enabled autonomy in passenger transport is a game changer for CPS and CPSoS IVVQ. In case of accident and legal consequences, the balance of responsibility options between the faulty-machine and the faulty-operation will no longer be the same [Frz12]. We perceive some recent renewed interest for formal verification techniques on development of autonomous systems. A similar phenomenon occurred two decades ago with the advent of new cybersecurity threats in IT and open-world OT. In spite of the new scales of complexity addressed in section 1, leveraging formal specification, design and verification techniques at system level, primarily on safety critical control aspects, is the rationale that underpins the next three sections.

2.1 Compositional verification & certification (Pb#1)

In software engineering, modularity has been instrumental for scalability. Do we have any equivalent for CPS and CPSoS development, verification, and certification? On the system engineering side, the notion of functional chain is the standard approach to specification and unit system testing. On the safety assessment side, dysfunctional analysis relies on two standard methods:

  1. From the local to the global: identification of the component failures^15 , and analysis of failure propagation i.e. of the “cascading effects” or “domino effects” (FMECA^16 ),
  2. From the global to the local: Boolean modeling of the causes (FTA^17 ) of the macroscopic feared events identified by risk analysis. Both analyses are chain, tree, or DAG^18 based, i.e. cycle-free. As explained in the previous section, CPS and CPSoS highly depend on control. They are cycle-intensive. Control loops are causality cycles not causality chains, trees or DAGs. Therefore, control is difficult to modularize because of this twofold cyclic nature:  Spatial : the data-flow dependencies between the control loop entities form a spatial cycle (sensors - > controllers - > actuators - > plants - > sensors),  Temporal : these spatial loops compute state updates that are iterated over time, thus constituting temporal loops. (^15) Random and systematic. For the latter (i.e. residual development faults, in other words development assurance failures), the initiators are prototypical representatives of unknown potential errors. (^16) Failure Mode Effect and Criticality Analysis (^17) Fault Tree Analyis : all the causes of a macroscopic feared event are asked by the safety norms to be represented by a Boolean combination of the anticipated possible component failures at meso- or micro-level. (^18) Directed Acyclic Graph

E. Ledinot

  1. Requirement coverage : the operational, functional and organic requirements are numbered and stored in requirement databases. The verification test plan defines as many test cases as needed to exercise each of the requirements sufficiently. 100% coverage of the requirements is commonly required for the higher levels of criticality. This coverage measure is enforced by means of traceability links between the requirements and the test cases,
  2. Structural coverage : we anchor activation counters to the elements of the structure to record how many times the previous V&V activities have activated them. The aim is to detect the parts left inactivated after^20 completion of the requirement-based testing campaign. These potentially not “truly dead” parts are regarded as sources of emergent properties, named unintended functions in aeronautic safety standards. Structural coverage analysis is a behavioral coverage analysis substitute. It is one of the most important rigor modulation means of the safety assurance standards for all industrial domains (e.g. the DAL- dependent IC, DC, MC/DC coverage criteria). It is one of the most fundamental means of building trustworthiness for the digital. Even though nature solves variational invariants, nature is not a computer [Cop17]. At least we do not have access to instructions on which we could hook activation counters, we have no behavioral coverage substitutes. How to provide behavioral coverage measures for the physical, and for the governed physical? 2.3 Safety assessment on high fidelity models (Pb#3) There is no need to add the complications of CPS w.r.t. embedded systems to encounter the third problem. It was identified for embedded systems by the early 2000s [Ake06]. Its first aspect consists in the split between functional safety engineering and safety assessment at modeling & analysis level. System Engineering (SE) uses behavioral modeling and simulation. Safety Assessment (SA) uses FMECA and FTA models^21. For highly integrated software-intensive and distributed CPS we have questioned the validity of FTA models [Led20]. The fidelity of MBSA models w.r.t SE models, though better than FTA models’, is still a matter of concern^22. The situation is worsening with CPSoS. The “reality gap” is widening. The second part of the model fidelity problem is a matter of risk analysis mindset. We follow [Lev11] and support the system-theoretic approach to risk analysis. We privilege the term “control-theoretic” to that of “system-theoretic” because we intend to draw more from control theory than from general system theory (c.f. 4.1 on control invariants and 4.2 on observability). The basic view is the same: analyze accident risks primarily as control losses. Function/component losses or malfunctioning are causal factors of control losses. (^20) As opposed to during. There is no structural coverage goal. The goal is detection of the possible useless implementation parts. They are deemed to constitute risks of harmful “emergent properties”. The goal of structural coverage analysis is not structure activation by any means. (^21) On average in industry, as of writing this paper. Safety assessment may also resort to Model-based Safety Assessment (MBSA), to Monte Carlo Markov Chain estimations, etc. [Bou08]. (^22) For instance, synchronous abstraction generates causality cycle errors in SA models because of the twofold cyclic nature of control.

11 CPS Engineering : Gap Analysis and Perspectives Submitted to LITES journal, Leibniz Transactions on Embedded Systems , special issue on Distributed Hybrid Systems- 2021 Figure 6 – System s afety as control game. Three types of internal adversaries (integrity safety) are represented on the lower half. The component failures and their associated “domino effects” analyzed by FMECAs constitute the prominent contributor. Among the external “adversaries”, many (unintended or intended) human errors, and many unfavorable environmental conditions (SOTIF). The six sources interact with the nominal behavior (first 6-fold parallel product), and superpose with one another (second type of parallel product). The grey area notionally represents the part of disturbed behaviors that remains controllable by the FDIR^23 and control players (controllability domain). Only low-order disturbance superpositions can stay in the controllability domain. Component failures and the ensuing failure propagations remain a major source of potential control losses. For instance on a medium-size aircraft one counts about 6500 component failure modes handled by crew and/or maintenance. However, inter-system specification flaws and human factor design errors are growing concerns. They lead to interaction failures, possibly without any component failure in the causal dependencies. Hence the crucial need of applying control-oriented risk analysis methods like STPA^24 on CPS and CPSoS. It starts with control of safety-constrained states, not with local breakdowns, or with dubious causality-inverse modeling of accidents. Figure 7: the control game consists in keeping the state subject to the safety constraints within the green zone (the nominal), or at worst within the amber zone, i.e. the robustness zone where dynamics should stay thanks to the degraded modes and the FDIR mechanisms. Out of the FDIR controllability domain, the system enters “red zones” of the accessible state space, i.e. accidents of any kind of severity. Red means contract violation, not catastrophic effect. (^23) Fault Detection Isolation Recovery (^24) System Theoretic Process Analysis

13 CPS Engineering : Gap Analysis and Perspectives Submitted to LITES journal, Leibniz Transactions on Embedded Systems , special issue on Distributed Hybrid Systems- 2021 available modeling languages and verification tools, can we get full mastery of the governed digital- physical behaviors, at affordable price? Such an ideal CPS/CPSoS engineering we name it “Ω-engineering”. In spite of significant progress in hybrid system modelling, simulation, and formal verification, in spite of mature software and hardware development tools, we tend to consider that CPS/CPSoS engineering is still closer to α-engineering than to Ω-engineering. “μ” refers to “micro”. The μΩ project aims at reaching Ω-grade engineering on a μ-CPS/CPSoS, to start with, temporarily^26 getting rid of the scalability constraints. The goal is to explore the most disruptive techniques that could potentially meet the three aforementioned challenges. 3 .2 The origins

3.2.1 The Overarching Properties Work Group (OPWG)

A joint EU-US group of avionics certification experts, academic researchers, software vendors and certification Authorities, has worked from 2015 to 2018 to design a new approach to system, software and hardware certification. To test the viability of this new assurance method, an open source use case named μXAV based on a “μ-aircraft” that embedded three coupled systems has been used [Led 17 ]. The μΩ project reuses the μXAV use case, and extends it. Its primary concern is now CPS engineering for any scale of system (from SoC to CPSoS), while keeping emphasis on safety engineering and cost- effective certification.

3.2.2 The Embedded France work group on safety norms

Embedded France is a non-profit organization that represents the French CPS ecosystem w.r.t. to the French ministry of industry [EF20]. It coordinates a few work groups. One of them is devoted to the cross-domain comparison of system and software safety standards. When needed, it uses the μΩ use case to concretize issues debated on safety assurance principles.

3.2.3 The “Chaire ingénierie des systèmes complexes”

Founded by Ecole Polytechnique and THALES by 2003 , this corporate sponsorship program aims at supporting research and education in CPS engineering. The μΩ open-innovation use case is meant to facilitate scientific collaboration between academia and industry on CPS engineering [ISC20]. 3.3 The objectives The μΩ project is a collaborative attempt to identify state-of-the-art CPS engineering at minimum effort, without scarifying scalability analysis. The main focus is mastery of digital-physical behavioral complexity at affordable price for distributed, software intensive, safety critical and security critical CPS/CPSoS. We describe μXAV, the tiny CPS/CPSoS specified from 2015 to 2018. We give a status on its current development and present its planned evolutions. 3.4 Overview μXAV is a pseudo-drone whose embedded system architectures are intended to comply with large aeroplane certification regulation (CS 25 [EAS20]). These architectures must be devoid of single point catastrophic failures. To meet this regulatory safety objective, duplex architectures, health monitoring and fault-tolerance mechanisms have been introduced. It is of course not the case on true drones for obvious reasons of weight, cost, and overkill w.r.t. the true applicable safety regulation. (^26) The use case is « small » but end-to-end. Scalability will be addressed everywhere on two aspects: algorithmic complexity and staff learning curve.

E. Ledinot AV stands for Air Vehicle, X for any kind of mission/purpose, and μ for micro as in μΩ. μXAV is a cargo drone that transport payloads. It embeds a multi-system that mimics generalized control (electrical, mechanical, and hydraulic). It is not representative of the kind of CPSoS reviewed in the first section. Addition of some features typical of SoS and structure varying systems is planned. The repository of the 2015 - 2018 collaborative project is available at https://github.com/AdaCore/RESSAC_Use_Case. It contains about 20 specification documents (operational, functional, architectural, safety), at two levels of refinement (layer0 and layer1), and a few (incomplete) models. 3.5 The operational viewpoint μXAV is capable of autonomous and remotely controlled missions. Its systems ensure flight for a large domain of payloads, mission profiles and weather conditions. Range, max take-off-weight (MTOW), availability, mission reliability, and energy efficiency are the key performance indicators and values for the customer. Depending on the payload, on the environment conditions, and on load of the batteries, missions last from a few minutes to about 20 minutes. μXAV is a product-line. It supports variability domains on payloads, mission vignettes, internal architectures and operating procedures. The missions are composed of 7-phase sequences: pre-flight (mission preparation), take-off, climb, cruise, descent, landing, post-flight. Presently, the drone’s dynamics is simplified. It is not a true hexa- copter; flight mechanics is restricted to the 2D-(Distance, Altitude) vertical plane and pitch. The drone is a 3D mechanical body for 0D platform design (set-based dimensioning and trade-off analysis). But it flies and navigates in the 2D vertical plane. Mission preparation consists in defining the distance to travel, the cruise speed and the cruise altitude, the navigation mode and the navigation option. The navigation mode (A or RP^27 ) defines whether the drone is autonomous, or guided by the ground station. The navigation option specifies the reference navigation parameter used by cruise regulation (speed, altitude, or energy minimization^28 ). The continuous-time part of the dynamics is non-linear and switched, either by ground navigation commands, or by onboard fault tolerance reconfigurations. There are no intermediate waypoints, and no air separation aspects, yet. Simple air separation constraints will be introduced with the SoS extensions (see §3.14). When too many failures or too disturbing environmental conditions occur, the ground station operator or the drone safety monitors can trigger emergency landing. Emergency landing plays the role of the High Assurance Controller (HAC) defined in the simplex design pattern [Wan13]. It aborts the mission and performs a vertical landing that should preserve the payload and the drone’s integrity. The mission management, energy management and flight control functions jointly play the role of the High Performance Controller (HPC). Mission Management depends on a battery charge predictor and on a range predictor developed by Machine Learning. μXAV supports μ-experiments on certification of trustable embedded AI. For more details, see the functional specification documents on [RES18]. (^27) Autonomous (A), Remotely Piloted (RP). Remotely Guided would be more appropriate as remote short-term control of the drone, i.e. piloting, is impossible. (^28) Named low-power in IoT.

E. Ledinot Figure 9 - Overview of the μΩ physical engineering process. It interfaces with the platform Finite Element Method (FEM) calculations but does not address this part. Two rounds of 0D-1D analysis: 1) defining the objectives, 2) checking they are met, for a product-line configuration domain. The μΩ project addresses the 0D-1D engineering process and its way to interface with the 3D-4D one by means of Reduced Order Models (ROMs), surrogate models, ANNs^32 , or empirical formulas. In table 1, we give the platform design parameters, whether they participate to the product-line variability domains, and their influence on the performance indicators. Sizing Parameters Product-line Variability Influences Platform Performance Indicators System Performance Indicators height, width, depth yes drag, mass, bay volumes cross-section, Cx usable volumes maximal range (conditioned by payload and weather condition), take-off and landing safety envelopes material density yes empty mass calibrated weights maximal range, flight level, speed limit (conditioned by payload and weather condition) equipment-bay volume yes 3D-layout pass/fail on a set of equipment configurations pass/fail on a set of mission objectives payload-bay volume yes 3D-layout pass/fail on a set of payload configurations pass/fail on a set of mission objectives cooling air inlets no inner temperature field cooling flow chart (conditioned by operating point and weather condition) temperature-sensitivity chart on a predefined set of missions Table 1 – Platform modification parameters and associated effects (^32) Artificial Neural Networks.

17 CPS Engineering : Gap Analysis and Perspectives Submitted to LITES journal, Leibniz Transactions on Embedded Systems , special issue on Distributed Hybrid Systems- 2021 3.7 The environmental viewpoint The system inputs are:  the ground station user-commands  the field operator commands,  the external disturbances: wind gusts and icing,  the 35 random failure events (motors, batteries, micro-controllers, power switches, electrical pump, etc.). See [RES18] for detail on the field operator and ground station commands. Part of these inputs are subject to uncertainties. Set-based methods (intervals) and worst-case analysis address uncertainties for deterministic safety. For mission reliability, uncertainties are handled by means of probabilistic average-case analysis. For probabilistic aspects of safety, probabilistic modeling of uncertainties is under worst-case and extreme value regime. Figure 10 – The external input sources. 3.8 The embedded systems The use case has been designed to explore engineering and assurance of a safety critical generalized control loop that span over five coupled systems:

  • Mechanical platform and inertial sensors (MS),
  • Electrical Propulsion System (EPS),
  • Hydraulic Braking System (HBS),
  • Mission Management System (MMS),
  • Connectivity Management System (CMS). The ground station was initially out of scope, but it will be integrated in the scope with SoS extensions. The main control loop is subject to perturbations. They are the “adversaries” of the safety differential game addressed in section 2.3 and 4 .3:
  • Environmental conditions : wind gusts, icing,
  • Human errors : wrong mission parameters at pre-flight, wrong navigation commands during flight; 8 different commands can be sent through the two channels (panel, station),
  • Failure modes : 35 component breakdowns may physically initiate cascading failures. They are the anticipated failure modes, diagnosable by the health monitoring function (F_HM), and addressed by maintenance. There are many more possible failure modes, some of which are unknown unknowns.
  • Systematic failures : some specific scenarios may activate unknown residual development errors (specification and/or implementation). This issue is addressed by minute design of the FDIR detectors (see §4.2), and control-oriented functional FMEA supported by STPA.
  • Fault-tolerance : 30 FDIR^33 recovery transitions, potentially generating control instability (e.g. asynchronous delays between micro-controllers, value jumps of software variables possibly leading to transient actuation shocks), (^33) Fault Detection Isolation and Recovery

19 CPS Engineering : Gap Analysis and Perspectives Submitted to LITES journal, Leibniz Transactions on Embedded Systems , special issue on Distributed Hybrid Systems- 2021

  • Health Monitoring (F_HM): the electronic devices are self-diagnosed by built-in tests. The physical devices are monitored and diagnosed indirectly by “virtual sensing”, i.e. by augmented observability. Two kinds of methods are used and compared on-line: o Model-based : the specifying Simulink models used for design and V&V of the control laws are enriched with models of faults for the motors, batteries, pump, and brakes. Then, using analytic redundancy methods one can detect and partially isolate at run- time the modeled-fault occurrences [Fri17], o Data-based : resorting to the wide spectrum of machine learning techniques, deviation form normality can be detected. For frequent failures whose signatures can be learnt, one can provide some isolation and diagnostic capabilities by adding classifiers of fault-effects. Figure 12 – The nominal behavior. Left: the phase/mode/option Statechart diagram. Right: the functional flows exchanged between the core functions. Figure 12 shows how μXAV features switched control^35. There are three sources of regulation mode switching:
  1. Mission phase sequencing,
  2. Navigation commands that change the reference values (emitted by the ground station, RP mode),
  3. Failure modes that trigger fault-tolerance recoveries. Figure 13 presents the overall control structure analyzed with STPA [Lev11]. For clarity reasons, it does not contain the FDIR loops. (^35) Non-linear control, both on the plant side (non-linear ODEs) and on the controller side (e.g. saturations).

E. Ledinot Figure 13 – The control loop structure analyzed with STPA – Safety is addressed primarily as a problem of controllability under disturbances and uncertainties. Failure modes and failure propagations are regarded as one in many causes of unsafe control actions. The logical architecture and some aspects of the physical architecture are represented in figure 14. The four systems (MMS, CMS, EPS, HBS) have dual-channel computing and communication resources to be compliant with the “no single cause catastrophic failure” CS25.1309 requirement. Figure 14 – the μXAV duplex multi-system architecture 3.9 The safety viewpoint The air vehicle level Functional Hazard Analysis (AFHA), conformant to ARP4961 template, is available in the safety assessment folder of [RES18]. The μΩ project has scheduled completion of the ARP49 61 - compliant safety assessment process for the μXAV, i.e. completion of the PASA^36 , of the four PSSAs^37 , (^36) Preliminary Aircraft Safety Assessment (^37) Preliminary System Safety Assessment