

































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The evolution of embedded system engineering to cyber-physical systems engineering (CPS), focusing on the unique challenges and research areas in CPS. safety engineering, control theory, networked control, and the interaction between physics, computation, and communication. It also proposes potential research areas such as set-based system engineering, geometric analysis of dynamics, and paradigm shift in safety engineering.
Typology: Summaries
1 / 41
This page cannot be seen from the preview
Don't miss anything!


































Emmanuel LEDINOT THALES Research & Technology France
Virtualization of computing and networking, IT-OT convergence, cybersecurity and AI-based enhancement of autonomy are significantly increasing the complexity of CPS and CPSoS. New challenges have emerged to demonstrate that these systems are safe and secure. We emphasize the role of control and emerging fields therein, like symbolic control or set-based fault-tolerant and decentralized control, to address safety. We have chosen three open verification problems we deem central in cost-effective development and certification of safety critical CPSoS. We review some promising threads of research that could lead in the long term to a scalable and powerful verification strategy. Its main components are set-based and invariant-based design, contracts, adversarial testing, algorithmic geometry of dynamics, and probabilistic estimation derived from compositional massive testing. To explore these orientations in collaborative projects, and to promote them in certification arenas, we propose to continue and upgrade an open innovation drone-based use case that originated from a collaborative research project in aeronautic certification reformation.
From the 70s to the early 2010s, development of systems featuring real-time control of physics has been named “embedded system engineering” or “automation”. The embedded systems, e.g. defence systems, power plants, transportation vehicles, health care devices, etc., have reached over these decades high levels of complexity often measured by the size of embedded software (Klocs, Mlocs, etc.). Early on these embedded systems have featured networking aspects. First internally to connect the sensors, actuators, micro-controllers and mission computers, then externally by means of wireless communications. This outer-connectivity has recently blossomed with mobile devices and with the Internet of Things, also named Machine2Machine or Internet of Everything. Why then the emergence, by 200 6 - 2010 , of the “Cyber Physical System” appellation to denote systems that seem identical to what has been developed so far in Operational Technologies (OT)? In other words, is “CPS” some rebranding intended to cope with the latent perception that system engineering in general, and embedded system engineering in particular, are mature fields that no longer deserve to be in limelight? We start by analyzing “Cyber” of “Cyber-Physical”. It is not “Cyber” as meant by cyber of cyber-security, though cybersecurity admittedly is of prominent importance in CPS engineering. We support the opinion according to which some characteristics distinguish CPS from embedded systems and automation. We review some of them with safety and behavioral complexity in mind. Then, we propose three CPS engineering problems we think remain open for academic research and industrial practice altogether. They have always been, and still are, open problems for embedded systems and system engineering in general. However, with the consent of many, and occasionally huge, budget and
E. Ledinot schedule overruns, system engineering managed to cope with them, and in the end to deliver good quality products and services. Based on three long lasting problems we chose among many other ones, we argue that the lag of engineering w.r.t. to product and operation complexity is likely to worsen significantly with CPS, and even more so with CPSoS. In the third section, we present an open innovation CPSoS end-to-end development use case initiated in aviation certification research by 2015 - 2018. We propose to continue collaborative research on an extended and refactored version, in search for solutions to the three selected open problems. They all focus on behavioral specification, design and verification. Finally, we present three groups of research work we deem have some potential to fill these three pointed gaps: set-based system engineering, geometric analysis of dynamics, and paradigm shift in safety engineering.
A significant number of expert groups have addressed this question worldwide, for instance [P4C18] and [EF20] in Europe, yet without getting to consensus. We contribute some perspectives, motivated by certifiable autonomy and IT-OT convergence one side, by some feeling of under estimation on the other side. In the late 40s, the mathematician Norbert Wiener perceived the prominent role of feedback to understand the behavior of natural and artificial systems. He coined the term Cybernetics [Wie48]. He derived it from the greek “Kubernêtiké”, whose meaning is “the art of governing”. Since then, “Control” has superseded “Cybernetics”. However, “Cyber” did not disappear whatsoever, as witnessed by “cyber-security”. The meaning of “cyber” seems to have shifted from “governing” to some synonymous of “digital”. In this paper, we promote the primal interpretation of “Cyber”, i.e. steering, governing, as the prominent feature of Cyber Physical Systems. This stance is helpful to understand why CPS engineering raises new challenges compared to embedded system engineering and IT system engineering. Control theory, fault-tolerant control, networked control, compositional control, symbolic control, and control-oriented safety are examples of the scientific and technological background we deem at the heart of safety critical CPS and CPSoS engineering. Digital real-time control has been key in the constant progress of embedded systems since the 80s. However, up to end of the 90s control engineering has been used on a physical domain per physical domain basis: mechanical control, electrical control, flight control etc. Multi-physics modeling and control (e.g. Modelica [Mod96]) has emerged recently, mainly necessitated by hybrid car design and enabled by Gbyte-RAM computers^1. CPS enhance the multi-physics multi-control trend. With CPS, multi-physics control, also named generalized control, becomes the norm. Most importantly, it no longer remains local , as with embedded systems. It scales up to the global : from stamp-size SoCs up to smart cities, power grids or satellite constellations.
One often defines cyber physical systems as systems where “physics, computation and communication are in interaction”. We follow this symmetry-inclined line of thought. In the sequel, we subsume “Physics” by “Control”. Control has historically developed as the science of governing physical devices (e.g. J. Watt’s flyball governor). However, with software-based virtualization of some physical and hardware resources like networks and computers, control has also pervaded the digital. (^1) Needed for the sophisticated computer algebra transformations of Differential Algebraic Equation systems (DAEs) into numerical integration programs.
E. Ledinot We review the six interactions of the trilogy to underline where CPS and CPSoS depart from embedded systems. Computation for Control Sensing and signal processing have always been computation-intensive in embedded systems. Think of radars for instance. They have kept being so in spite of more than six decades of Moore law. Today the need for high performance computing (HPC) is still increasing, drawn as it is by the needs of AI- based machine vision (CNNs, DNNs^5 , etc.), a “must-have” for autonomous systems. Sensor redundancy (radars, LIDARs, cameras), sensor fusion, image and video processing, NN^6 training and generalization, all have boosted the need for HPC to support the sensing part of control. Autonomous vehicles (cars, trains, tramways, drones, underwater autonomous vehicles etc.) critically depend on visual sensing and vision-based control, both highly computation intensive. We regard autonomous vehicles as embedded systems when considered in isolation, and as CPS, CPSoS or SoCPS when connected and coordinated. The control scope and its life-cycle management make the difference, as proposed in 1.2. Control for Computation This notion seems to have appear with low power (e.g. mobile devices) and data-centers (orchestrated virtual machines). In both cases, control senses the available resources and the load profile, to allocate the resources and schedule the tasks in compliance with the QoS and resource footprint objectives. Operation research algorithms optimize resource allocation. Consequently, timing analysis of the executive layer, key for control stability, has become significantly harder, and safety demonstrations in turn. Connectivity for Computation Parallel and distributed computing architectures are ubiquitous in CPS for sensing, Big Data, or to support coordination of entities by means of wireless networks (LTE, 5G, …). Examples of connectivity- intensive computation infrastructures are manufacturing plants of industry 4.0 or command and control of transportation infrastructures (ATM/ATC, railway, …). Control and Computation for Connectivity Virtualization of the network resources (Software Defined Networks) has paralleled virtualization of the computing resources. It has enabled optimized sharing of the physical resources to support transparency of technological heterogeneity and to support massive transfer services with high bandwidth variability. SDN relies on feedback scheduling , i.e. continuous sensing of the physical and logical resources’ sates to adapt the configuration and the operating modes. Connectivity for Control A crane is a CPS and an embedded system^7. We regard a (4-crane)-crane, i.e. four coordinated cranes that lift altogether a load whose weight is beyond their respective capacity, as a CPS that no longer is an embedded system. Collaborative cranes give just an example of collaborative missions. The equivalent for defense systems, NCW^8 , appeared in the 2000s. Swarms of coordinated drones is another typical example. Data-links, e.g. ADS^9 - B, CPDLC in aviation, LTE, 5G, RFID, and all types of (^5) Convolutional Neural Networks, Deep Neural Networks (^6) Neural Networks (^7) We support the idea that the definition of CPS should extend that of embedded system, without excluding them from being (limit-case or “degenerated”) CPS. (^8) Network Centric Warfare (^9) Automatic Dependent Surveillance-Broadcast, Controller-Pilot Data Link Communications
5 CPS Engineering : Gap Analysis and Perspectives Submitted to LITES journal, Leibniz Transactions on Embedded Systems , special issue on Distributed Hybrid Systems- 2021 wireless communications have enabled design of collaborative systems that go far beyond mere interoperability, interconnection, remote monitoring, or remote maintenance. It has enabled collaborative control , whether centralized or decentralized. Thanks to generalized wireless connectivity, new physical hazards dependent on global control loops are being created at all scales of integration (tight-coupling) or interoperability (loose-coupling). We now review some CPS aspects where engineering is challenged. They motivate the three open problems and associated research tracks proposed in this paper.
In embedded system engineering, safety is mainly addressed as architectural mitigation of component failures, i.e. as fault-tolerance of reliability events. For CPS engineering, we deem necessary to address safety primarily as a controllability issue, controllability under disturbances and uncertainties [Lev 12 ]^10. Among the many disturbance sources, physically initiated component failures remain a major one. However, with software-intensiveness and connectivity-intensiveness, the flaws at system specification level, i.e. the risks of overlooked interaction failures at design time, tend to be a concern on par with that of component failures. Figure 3: pervasive control in IT-OT convergence – CPS/CPSoS: software intensive and connectivity intensive systems and systems of systems We observe with some concern the headlong rush towards generalized connectivity and interaction of mobile devices, machines, vehicles, infrastructures, and people. Remote monitoring of physical installations, predictive and remote maintenance are among the main business drivers of this race, amplified by the Big Data and AI-Machine Learning promises. However, even with state of the art engineering, can we trustworthily verify such CPS and CPSoS? What is behavioral verification coverage for distributed physics, systems, and systems of systems?
The promise of virtualization in OT is cost reduction, facilitation of deployment over massive numbers of end-points, and better adaptivity (“elastic resources”). However, virtualization has major drawbacks for safety assurance. First, it complicates substantiation of execution integrity. Second, sharing multiplies the single points of failures where traditional engineering cautiously resorted to physically (^10) N. Leveson uses the term « system-theoretic », while putting forth “control-loop structures”. We deem “control-theory” a theoretical setting more beneficial for safety than “system theory”. Both converge on the core idea of “state governance”.
7 CPS Engineering : Gap Analysis and Perspectives Submitted to LITES journal, Leibniz Transactions on Embedded Systems , special issue on Distributed Hybrid Systems- 2021 actuate exclusively digital resources. From a safety perspective, there should be no issue with So-CPS since the hazards are related to physical damage^13 , and the potential physical damages depend on a single component-system in spite of its communications with the other component-systems. Second, what could be the difference between CPS and CPS-oS? The “-oS” denotes limited capability of federating the component-systems in spite of existence physical-control loops that span over them. From safety assurance standpoint, this is potentially a dangerous situation. Contrary to the straightforward and common interpretation, we suggest that there should be no difference of scale between CPS and CPSoS^14. A smart city, or a power grid may be regarded as a CPS and not as a CPSoS as long as a prime contractor, or any institution playing the same federating role, has the ability to master design and component-wise life-cycles wherever safety critical physics is governed. Global inter-system control is likely to be of the “directed”, “acknowledged”, or “collaborative” types since safety-criticality is at stake. When there is too much independence between the development and lifecycle management of the component-systems, there is potentially a lack of behavioral consistency on the physical aspects. One may use CPS-oS instead of CPS to manifest weaker coherency enforcement on the components, safety-sensitive limitation on federating management. 1.3 Synopsis We have proposed control, primarily of the physical but also of the digital, as a key characteristic of CPS, and physical control scope as an indicator to discriminate between CPS-oS and So-CPS. Whatever the scale of integration or interoperability, when physics is in interaction through the component- systems, we propose to choose between CPS and CPS-oS according to the answer to the following question: at inter-system level, is control under control? What is at stake with the emerging digital society has been summarized in figure 4, that we have borrowed from [TCI21] and augmented slightly. We have supplemented the digital continuum with the governed digital-physical continuum , to manifest pervasive IT-OT control with all its shades of business, safety, and security criticality. We regard the informational-physical control continuum as a candidate central issue, if not concern, of the digital society underway. (^13) The case would be different if safety were extended to business damage (^14) The concept of system is recursive and scale-invariant
E. Ledinot Figure 4: A conceptual reformulation of figure 3, as an augmented version of the IT-centric view of [TCI2 0 ]. The continuum is Digital - Physical , IT- OT , the two components interacting through controlled or uncontrolled digital-physical influence networks. We have used the STPA graphical notation for control loops. Besides AI-Machine Learning, progress in distributed fault-tolerant and resilient control is needed to build a trustworthy digital society. Any accident is primarily a loss … of control.
We focus on three engineering problems we think are important to address the new IVVQ and safety engineering challenges that arise with CPS and CPSoS. They are all related to behavioral analysis. The raise of rework and “emergent properties” correlated to that of distribution and networked control motivates the first one. The lack of measures of behavioral verification coverage at system level is the second one. The third open problem is the widening fidelity gap between safety assessment models and system engineering models. AI-enabled autonomy in passenger transport is a game changer for CPS and CPSoS IVVQ. In case of accident and legal consequences, the balance of responsibility options between the faulty-machine and the faulty-operation will no longer be the same [Frz12]. We perceive some recent renewed interest for formal verification techniques on development of autonomous systems. A similar phenomenon occurred two decades ago with the advent of new cybersecurity threats in IT and open-world OT. In spite of the new scales of complexity addressed in section 1, leveraging formal specification, design and verification techniques at system level, primarily on safety critical control aspects, is the rationale that underpins the next three sections.
In software engineering, modularity has been instrumental for scalability. Do we have any equivalent for CPS and CPSoS development, verification, and certification? On the system engineering side, the notion of functional chain is the standard approach to specification and unit system testing. On the safety assessment side, dysfunctional analysis relies on two standard methods:
E. Ledinot
11 CPS Engineering : Gap Analysis and Perspectives Submitted to LITES journal, Leibniz Transactions on Embedded Systems , special issue on Distributed Hybrid Systems- 2021 Figure 6 – System s afety as control game. Three types of internal adversaries (integrity safety) are represented on the lower half. The component failures and their associated “domino effects” analyzed by FMECAs constitute the prominent contributor. Among the external “adversaries”, many (unintended or intended) human errors, and many unfavorable environmental conditions (SOTIF). The six sources interact with the nominal behavior (first 6-fold parallel product), and superpose with one another (second type of parallel product). The grey area notionally represents the part of disturbed behaviors that remains controllable by the FDIR^23 and control players (controllability domain). Only low-order disturbance superpositions can stay in the controllability domain. Component failures and the ensuing failure propagations remain a major source of potential control losses. For instance on a medium-size aircraft one counts about 6500 component failure modes handled by crew and/or maintenance. However, inter-system specification flaws and human factor design errors are growing concerns. They lead to interaction failures, possibly without any component failure in the causal dependencies. Hence the crucial need of applying control-oriented risk analysis methods like STPA^24 on CPS and CPSoS. It starts with control of safety-constrained states, not with local breakdowns, or with dubious causality-inverse modeling of accidents. Figure 7: the control game consists in keeping the state subject to the safety constraints within the green zone (the nominal), or at worst within the amber zone, i.e. the robustness zone where dynamics should stay thanks to the degraded modes and the FDIR mechanisms. Out of the FDIR controllability domain, the system enters “red zones” of the accessible state space, i.e. accidents of any kind of severity. Red means contract violation, not catastrophic effect. (^23) Fault Detection Isolation Recovery (^24) System Theoretic Process Analysis
13 CPS Engineering : Gap Analysis and Perspectives Submitted to LITES journal, Leibniz Transactions on Embedded Systems , special issue on Distributed Hybrid Systems- 2021 available modeling languages and verification tools, can we get full mastery of the governed digital- physical behaviors, at affordable price? Such an ideal CPS/CPSoS engineering we name it “Ω-engineering”. In spite of significant progress in hybrid system modelling, simulation, and formal verification, in spite of mature software and hardware development tools, we tend to consider that CPS/CPSoS engineering is still closer to α-engineering than to Ω-engineering. “μ” refers to “micro”. The μΩ project aims at reaching Ω-grade engineering on a μ-CPS/CPSoS, to start with, temporarily^26 getting rid of the scalability constraints. The goal is to explore the most disruptive techniques that could potentially meet the three aforementioned challenges. 3 .2 The origins
A joint EU-US group of avionics certification experts, academic researchers, software vendors and certification Authorities, has worked from 2015 to 2018 to design a new approach to system, software and hardware certification. To test the viability of this new assurance method, an open source use case named μXAV based on a “μ-aircraft” that embedded three coupled systems has been used [Led 17 ]. The μΩ project reuses the μXAV use case, and extends it. Its primary concern is now CPS engineering for any scale of system (from SoC to CPSoS), while keeping emphasis on safety engineering and cost- effective certification.
Embedded France is a non-profit organization that represents the French CPS ecosystem w.r.t. to the French ministry of industry [EF20]. It coordinates a few work groups. One of them is devoted to the cross-domain comparison of system and software safety standards. When needed, it uses the μΩ use case to concretize issues debated on safety assurance principles.
Founded by Ecole Polytechnique and THALES by 2003 , this corporate sponsorship program aims at supporting research and education in CPS engineering. The μΩ open-innovation use case is meant to facilitate scientific collaboration between academia and industry on CPS engineering [ISC20]. 3.3 The objectives The μΩ project is a collaborative attempt to identify state-of-the-art CPS engineering at minimum effort, without scarifying scalability analysis. The main focus is mastery of digital-physical behavioral complexity at affordable price for distributed, software intensive, safety critical and security critical CPS/CPSoS. We describe μXAV, the tiny CPS/CPSoS specified from 2015 to 2018. We give a status on its current development and present its planned evolutions. 3.4 Overview μXAV is a pseudo-drone whose embedded system architectures are intended to comply with large aeroplane certification regulation (CS 25 [EAS20]). These architectures must be devoid of single point catastrophic failures. To meet this regulatory safety objective, duplex architectures, health monitoring and fault-tolerance mechanisms have been introduced. It is of course not the case on true drones for obvious reasons of weight, cost, and overkill w.r.t. the true applicable safety regulation. (^26) The use case is « small » but end-to-end. Scalability will be addressed everywhere on two aspects: algorithmic complexity and staff learning curve.
E. Ledinot AV stands for Air Vehicle, X for any kind of mission/purpose, and μ for micro as in μΩ. μXAV is a cargo drone that transport payloads. It embeds a multi-system that mimics generalized control (electrical, mechanical, and hydraulic). It is not representative of the kind of CPSoS reviewed in the first section. Addition of some features typical of SoS and structure varying systems is planned. The repository of the 2015 - 2018 collaborative project is available at https://github.com/AdaCore/RESSAC_Use_Case. It contains about 20 specification documents (operational, functional, architectural, safety), at two levels of refinement (layer0 and layer1), and a few (incomplete) models. 3.5 The operational viewpoint μXAV is capable of autonomous and remotely controlled missions. Its systems ensure flight for a large domain of payloads, mission profiles and weather conditions. Range, max take-off-weight (MTOW), availability, mission reliability, and energy efficiency are the key performance indicators and values for the customer. Depending on the payload, on the environment conditions, and on load of the batteries, missions last from a few minutes to about 20 minutes. μXAV is a product-line. It supports variability domains on payloads, mission vignettes, internal architectures and operating procedures. The missions are composed of 7-phase sequences: pre-flight (mission preparation), take-off, climb, cruise, descent, landing, post-flight. Presently, the drone’s dynamics is simplified. It is not a true hexa- copter; flight mechanics is restricted to the 2D-(Distance, Altitude) vertical plane and pitch. The drone is a 3D mechanical body for 0D platform design (set-based dimensioning and trade-off analysis). But it flies and navigates in the 2D vertical plane. Mission preparation consists in defining the distance to travel, the cruise speed and the cruise altitude, the navigation mode and the navigation option. The navigation mode (A or RP^27 ) defines whether the drone is autonomous, or guided by the ground station. The navigation option specifies the reference navigation parameter used by cruise regulation (speed, altitude, or energy minimization^28 ). The continuous-time part of the dynamics is non-linear and switched, either by ground navigation commands, or by onboard fault tolerance reconfigurations. There are no intermediate waypoints, and no air separation aspects, yet. Simple air separation constraints will be introduced with the SoS extensions (see §3.14). When too many failures or too disturbing environmental conditions occur, the ground station operator or the drone safety monitors can trigger emergency landing. Emergency landing plays the role of the High Assurance Controller (HAC) defined in the simplex design pattern [Wan13]. It aborts the mission and performs a vertical landing that should preserve the payload and the drone’s integrity. The mission management, energy management and flight control functions jointly play the role of the High Performance Controller (HPC). Mission Management depends on a battery charge predictor and on a range predictor developed by Machine Learning. μXAV supports μ-experiments on certification of trustable embedded AI. For more details, see the functional specification documents on [RES18]. (^27) Autonomous (A), Remotely Piloted (RP). Remotely Guided would be more appropriate as remote short-term control of the drone, i.e. piloting, is impossible. (^28) Named low-power in IoT.
E. Ledinot Figure 9 - Overview of the μΩ physical engineering process. It interfaces with the platform Finite Element Method (FEM) calculations but does not address this part. Two rounds of 0D-1D analysis: 1) defining the objectives, 2) checking they are met, for a product-line configuration domain. The μΩ project addresses the 0D-1D engineering process and its way to interface with the 3D-4D one by means of Reduced Order Models (ROMs), surrogate models, ANNs^32 , or empirical formulas. In table 1, we give the platform design parameters, whether they participate to the product-line variability domains, and their influence on the performance indicators. Sizing Parameters Product-line Variability Influences Platform Performance Indicators System Performance Indicators height, width, depth yes drag, mass, bay volumes cross-section, Cx usable volumes maximal range (conditioned by payload and weather condition), take-off and landing safety envelopes material density yes empty mass calibrated weights maximal range, flight level, speed limit (conditioned by payload and weather condition) equipment-bay volume yes 3D-layout pass/fail on a set of equipment configurations pass/fail on a set of mission objectives payload-bay volume yes 3D-layout pass/fail on a set of payload configurations pass/fail on a set of mission objectives cooling air inlets no inner temperature field cooling flow chart (conditioned by operating point and weather condition) temperature-sensitivity chart on a predefined set of missions Table 1 – Platform modification parameters and associated effects (^32) Artificial Neural Networks.
17 CPS Engineering : Gap Analysis and Perspectives Submitted to LITES journal, Leibniz Transactions on Embedded Systems , special issue on Distributed Hybrid Systems- 2021 3.7 The environmental viewpoint The system inputs are: the ground station user-commands the field operator commands, the external disturbances: wind gusts and icing, the 35 random failure events (motors, batteries, micro-controllers, power switches, electrical pump, etc.). See [RES18] for detail on the field operator and ground station commands. Part of these inputs are subject to uncertainties. Set-based methods (intervals) and worst-case analysis address uncertainties for deterministic safety. For mission reliability, uncertainties are handled by means of probabilistic average-case analysis. For probabilistic aspects of safety, probabilistic modeling of uncertainties is under worst-case and extreme value regime. Figure 10 – The external input sources. 3.8 The embedded systems The use case has been designed to explore engineering and assurance of a safety critical generalized control loop that span over five coupled systems:
19 CPS Engineering : Gap Analysis and Perspectives Submitted to LITES journal, Leibniz Transactions on Embedded Systems , special issue on Distributed Hybrid Systems- 2021
E. Ledinot Figure 13 – The control loop structure analyzed with STPA – Safety is addressed primarily as a problem of controllability under disturbances and uncertainties. Failure modes and failure propagations are regarded as one in many causes of unsafe control actions. The logical architecture and some aspects of the physical architecture are represented in figure 14. The four systems (MMS, CMS, EPS, HBS) have dual-channel computing and communication resources to be compliant with the “no single cause catastrophic failure” CS25.1309 requirement. Figure 14 – the μXAV duplex multi-system architecture 3.9 The safety viewpoint The air vehicle level Functional Hazard Analysis (AFHA), conformant to ARP4961 template, is available in the safety assessment folder of [RES18]. The μΩ project has scheduled completion of the ARP49 61 - compliant safety assessment process for the μXAV, i.e. completion of the PASA^36 , of the four PSSAs^37 , (^36) Preliminary Aircraft Safety Assessment (^37) Preliminary System Safety Assessment