




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This technical brief introduces Policy-based Autonomous Clustering for the Edge (PACE), a manageability software that allows edge nodes to cluster and manage themselves via distributed consensus. It explains the main goal of PACE, its key features, advantages, and architectural elements. It also provides use cases and Q&A. how PACE fits within the bigger picture solution and how it differs from other configuration management tools. It explains the concept of distributed consensus and how it is used in PACE. examples of how PACE works and how it manages self-organizing, dynamic clusters.
Typology: Study Guides, Projects, Research
1 / 8
This page cannot be seen from the preview
Don't miss anything!





1
Authors .................................................................................
Policy-based Autonomous Clustering for the Edge (PACE) is a manageability software that allows edge nodes to cluster and manage themselves via distributed consensus. PACE is most valuable in scenarios where skilled IT staff are not available and thus autonomy is required. Another important scenario where PACE adds value includes instances where the cost, in terms of money, space, or power, of a traditional manageability infrastructure is too high, and/or where the chance of faults and partitions is high.
PACE provides autonomous infrastructure-less manageability. A PACE software agent on each node identifies other available nodes, forms a cluster, and maintains a common view of the participants and their characteristics with eventual consistency. Each agent utilizes a local copy of a common policy to achieve data consistency and determine its role(s) in the cluster. Use-case specific plugins then implement actions based on these roles. Examples of these roles include device configuration, service provisioning, resource, allocation, or other functions.
The main goal of PACE is to provide secure, policy-based decision-making in a self-organizing cluster. Unlike most system management solutions, PACE utilizes distributed consensus for decision-making. As a result, it doesn't require dedicated resources for control functionality, and it is particularly resilient to failures and partitions.
As an example, imagine a retail environment in which services may need to be deployed onto edge servers, point-of-sale terminals, or digital sign controllers. Services in this use case could include business-level software, such as the point- of-sale software. Or, the services may be management components, such Kubernetes (K8s) cluster software (servers and worker agents) or Device Management Service (DMS) agents, or SSH bastion host software. A policy for this cluster might state that the cluster needs a K8s server and a K8s worker, that the cluster requires an SSH bastion host, and that each host in the network have a DMS agent. PACE would run on each host in the network to deploy and configure software based on this policy.
Continuing this example, imagine that initially there is only one host in a particular retail location. In this case, the stated policy would require that this node act as the K8s server and worker (and thus run the server software and worker agent software), that this host run a DMS agent, and that this host act as an SSH bastion host. If a second node is introduced into this network, these functions can now be spread across the cluster. For example, the second host need only run the K8s worker agent and DMS agent. The K8s server and SSH bastion host functionalities need only run on one of the two nodes. Policy-based Autonomous
Clustering for the Edge (PACE) could move one of these functions to the new node, using characteristics of the nodes to make the choice (e.g., if one of the nodes is in a more secure location, it may be better suited for one of these functions).
The PACE policy defines the identities of nodes that are allowed to participate in the cluster, roles that these nodes take on, and rights which are made up of role members. The intent in this example is that a right controls which services a node would provide. However, the PACE service does not itself manage specific services. Rather, PACE is extensible, using a plugin architecture, so that a given right many be associated with a variety of different services, configurations, or resources that need to be managed in a cluster. In our example, one or more plugins would interpret rights and provide for deployment, configuration, and credential management of specific services (K8s, DMS agent, SSH).
For example, a given plugin might provide for management of K8s control plane and worker rights. A node that has K8s control plane rights should run the K8s server software and have credentials for the control plane. A node that has K8s worker rights should run the K8s worker agent and have worker credentials. Our example PACE K8s plugin would be able to deploy K8s software, configure it (e.g., K8s servers need to know the K8s cluster information, such as the IP addresses of the other servers), and provide credentials (e.g., generate control plane credentials for a new cluster and share those credentials to new members of the control plane). If a node loses a K8s control plane right or a K8s worker right, the PACE K8s plugin would be able to remove the associated credentials and/or rotate shared credentials to remove the node's access.
In the above examples, PACE is providing cluster membership management and dynamic, policy-driving management of rights, and plugins associated with PACE provide the service-specific logic to deploy, configure, and credential such services. PACE differs from other configuration management tools because it allows for management of self-organizing, dynamic clusters, and it allows the user to express policy about how services or configuration should be deployed across the cluster as a whole (rather than individual nodes), as cluster membership evolves.
Given the existing ecosystem of orchestration and platform management tools, it’s useful to consider where Policy- based Autonomous Clustering for the Edge (PACE) fits within the bigger picture solution.
PACE is similar to solutions for Orchestration or Device Management, in that it manages the configuration, deployment, and lifecycle of things like applications and
resources.
Like Orchestration and Device Management solutions, PACE is policy-driven. The user specifies a policy that defines the desired outcome. Although, the difference lies in that while Device Management solutions tend to be device focused (they define specifically how each device should be configured), PACE and Orchestration solutions tend to be systems focused. A systems focus means these solutions define how an overall collection of devices should be configured, without concern for a specific device.
Furthermore, what makes PACE different from many existing solutions is that decision-making is distributed, rather than centralized. In a system that uses centralized decision-making, all nodes send telemetry back to a central control point, which makes decisions, and sends instructions back to all nodes. There are three disadvantages to this approach. First, the control point itself consumes significant additional resources. Second, while the control point may itself include redundancy, it introduces a risk of failure or disconnection. For example, a Kubernetes cluster can have 3 redundant control nodes, but if two of them fail, or become disconnected from the worker nodes, the cluster management no longer functions. Third, the control plane itself typically requires a system manager to configure and maintain its function.
PACE, on the other hand, uses distributed consensus for autonomous control. Nodes discover each other and share information about their state, and this state is gossiped with eventual consistency. Each node has a copy of the policy and executes this policy against the eventually consistent state, such that every node makes the same
Figure 1 : Centralized versus Distributed Management
Figure 2 : The Policy-based Autonomous Clustering for the Edge (PACE) pattern of operation
In addition, PACE is much more tolerant to failures, continuing to function down to a single node, without human intervention. In addition, in the case of partitioning, each partition can continue to execute to the user-specified policy, regardless of size.
Each PACE node has an identity that is based on a public/private key and certificate. It is intended that this certificate be issued by a user-provided certificate authority, either in an existing or new certificate chain.
The PACE policy allows the user to describe which certificates and certificate chains are authorized to participate.
The PACE policy is a YAML-based definition of the desired cluster behavior. New policies can be injected into a cluster, and nodes will gossip to ensure that each node has the latest policy.
A Policy-based Autonomous Clustering for the Edge (PACE)policy consists of four components: Identities defines certificates and certificate chains for the node identities that may participate in the cluster. Roles allows for a dynamic, function-based selection of nodes that might take on specific roles. Rights takes collections of Roles and assigns them specific rights, which are interpreted by plugins to provide specific functionality (e.g., deploy a specific service, and enable a specific configuration). Auth specifies the nodes in the cluster authorized to take on specific tasks, such as key rotation or policy injection.
The PACE engine manages the cluster and interprets the policy. Use-case-specific functionality is provided through plugins. Plugins can provide a variety of different functionality:
▪ Plugins can take actions based on Rights that are assigned to a given node. Example actions might include deployment and configuration of a service,
resource allocation, or device configuration. ▪ Plugins can communication with each other. For example, a plugin designed to deploy a client service might contact a plugin that deploys the server service endpoint to obtain authentication credentials for the service. PACE provides a private and authenticated communication channel, as well as the policy for determining which endpoints to authorize. ▪ Plugins can extend the policy by providing new functions to be used in Role definitions. ▪ Plugins can add node characteristics, which can in turn be used when interpreting policy. For example, a plugin could determine whether a node has a GPU, and a policy function could then be used to constrain service deployment to nodes that have GPUs. ▪ Plugins can inject new policy. A plugin can, for example, provide an API that allows the user to configure resources by modifying the policy.
Because Policy-based Autonomous Clustering for the Edge (PACE)is extensible, it can be used for a wide variety of system management tasks.
A core usage of PACE is to manage the systems-level services on a cluster of nodes.
As an example, think back to the retail establishment, mentioned earlier, that wishes to manage a set of devices that are providing functions like point-of-sale, inventory management, and customer tracking. Devices might include point-of-sale terminals, storage services, and video analytics servers.
Kubernetes will be used to manage applications, but this requires one or more nodes to run the Kubernetes control plane software and other nodes to be configured as workers. In addition, a Device Management Service (DMS) will be used to keep the operating systems up to date, which requires a DMS agent on each node. Finally, system managers want to be able to remotely log into the cluster, so one node is configured as a bastion host.
Figure 7 : PACE employed in a retail use case
Figure 6 : An example PACE policy
5
Which nodes should run which functions? Perhaps at first, the business is small, and there is only one node. So, that node will run all of the services. As a second node is added, only one node needs to be the Kubernetes control plane and only one node needs to be the bastion host. As the business grows, a subset of nodes may be designated as the enterprise servers and might form a multi-node Kubernetes control plane. Moving between these configurations would typically require a system manager to decide what functions will run where, install software, and distribute appropriate credentials.
With PACE, the system manager simply deploys PACE on each node, along with a policy that describes how the system should behave. The policy might state that each node needs to run the DMS agent, and would determine, based on cluster size and characteristics, where the bastion host should be located, how many nodes to use in the Kubernetes control plane, and which nodes they should be. These functions can automatically be deployed and credentialed, as resources are added or removed from the cluster.
Note that, if the system manager wishes to change the policy that describes which services to be deployed and where, they need only inject a new policy into one of the nodes. Policy-based Autonomous Clustering for the Edge (PACE)will share this policy with all nodes in the cluster and implement the new instructions.
As an extension of the previous use case, imagine that a system integrator wants to define an API that allows the system services deployed on a cluster to be deployed from a cloud service. A thin API service can be developed that serves the API and translated the API calls into policy. On each new API call, the policy is adjusted and injected into the cluster, changing the set of services deployed in response to the API calls.
Note that this API service can be very thin, since the implementation of the policy is at each node, and no additional agents on the nodes is required. The API service can also be stateless, since the state is in the policy, which is distributed throughout the cluster. As a result, the API
service can also be deployed by PACE to any node in the cluster, as defined in the policy.
The above examples have focused on managing services. But PACE can also manage device resources, such as virtual machines. For example, imagine a system manager who wishes to create an API that allows callers to spin up or shut down virtual machines (VMs) that are deployed in a cluster. Each node might have a certain amount of capacity for deploying VMs. When a user places an API request to deploy a VM, the API finds a node that has spare capacity and deploys the VM. Information about the new VM is reported by the API. The API can also report on available resources and running VMs.
Now imagine that the API is just a thin service that creates a Policy-based Autonomous Clustering for the Edge (PACE)policy. When a user requests a new VM, an entry is added to the policy and the PACE cluster determines where to deploy the VM. Each node in the PACE cluster has attributes that represent the available capacity, allowing the nodes to collaboratively determine where to deploy the new VM. PACE provides secure inter-process communication that can allow the VM to determine the status of the VM, as it boots.
Again, the API service is thin and stateless, allowing it to be deployed by PACE, according to the policy.
System managers often face the task of updating operating systems or other critical software on devices in their infrastructure. A key aspect of this task is timing. It is often necessary to ensure that only a small subset of devices are deploying the update at a time, ensuring sufficient resources are available at all times. This factor may require a global sequencing of updates or at least ensuring that all devices that provide a specific function are not offline at the same time.
PACE can enable rolling updates by encoding the update constraints into a policy. For example, a plug-in extension to PACE could be created to detect the need for an operating system (OS) or other software update on a device and represent that fact as an attribute of the device.
Figure 9 : PACE implementing an API for Deploying Services on a Cluster
Figure 8 : PACE implementing an API for managing VMs
7
However, if the set of devices changes, perhaps due to failure or separation, these functions may need to be taken over by other devices.
In this scenario, PACE is ideal for managing the functions of the cluster. A policy can define the set of functions and the requirements for their deployment in the cluster, as well as what to do if resources become too limited to perform all functions or if the cluster becomes partitioned. And because PACE utilizes self-organization and distributed consensus, it can provide cluster management in the face of partitioning, or loss of all but a single node.
In the above Figure 12, a mobile cluster can utilize PACE to provide management of tasks across the cluster. However, when such a cluster is attached to infrastructure (perhaps before deployment), it could be managed by that infrastructure. A PACE policy can be defined that allows for switching between these two scenarios.
When connected to an infrastructure that has a Kubernetes control plane, for example, PACE can deploy a Kubelet on each node, configured to attach to the infrastructure. In this scenario, workloads deployed via Kubernetes can be run on the cluster. When disconnected from the infrastructure, however, PACE will deactivate the Kubelet, and instead directly run the mission workloads.
The above use cases have focused on using Policy-based Autonomous Clustering for the Edge (PACE)to manage the deployment of services or allocation of resources. PACE can also be used to control other services.
Consider a military example in which several ships form a PACE cluster. Each ship has its own data center, each managed by a Kubernetes cluster. PACE can be used to task each Kubernetes cluster, depending on situational factors, such as
▪ Which ships are present ▪ The attributes/status of the ships ▪ Location or other environmental factors
In this case, the policy specifies what tasks to inject into each Kubernetes cluster under various circumstances. A plugin would be used to inject a description of the desired tasks into each Kubernetes cluster.
In the previous examples, PACE has always formed a cluster based on whatever nodes it can discover. Typically, PACE will discover all nodes on the current LAN, but the user can also specify the addresses of PACE nodes to be discovered in the PACE configuration file. The latter could allow for clusters to form across LANs.
Note that only nodes with the same cluster ID, also specified in the PACE configuration file, will form a cluster.
It is also possible to dynamically form sub-clusters by using PACE to configure and launch PACE.
Figure 13 : A scenario in which PACE policy reacts to the presence or absence of network connectivity.
Figure 15 : UAVs Operating with Policy-based Autonomous Clustering for the Edge (PACE)
Figure 12 : PACE managing the workloads deployed into Kubernetes sub-clusters
Figure 14 : PACE in a master cluster deploying PACE to manage each sub-cluster
perform manageability tasks. However, this does cause cases where nodes may not be entirely in sync during decision-making. In simple cases, strict coordination between nodes is not required, and this kind of inconsistency is temporary and without consequence. We have demonstrated in the K8s example plugin that, in more complex situations, coordination between state machines at individual nodes can produce the desired results.
What about "split brain" scenarios? Manageability solutions designed for the cloud, like Kubernetes, are not designed to function in the face of partitioning. In some cases, manageability may stop functioning entirely, due to an inability to elect a leader. In other cases, two Kubernetes controllers may both mistakenly believe they are the leader at the same time, causing unintentional duplication. PACE is designed with partitioning in mind. When a partition occurs, every partition will form a sub-cluster and will execute the policy independently. It is up to the policy writer to decide what the behavior of each sub-cluster should be. In some cases, it may make sense for each sub-cluster to perform the same task. In other cases, it may make sense for specific sub-clusters to perform portions of the task (or none at all). Once partitioning resolves, the cluster will automatically re-join and the policy will again be executed for the entire cluster.
In this scenario, all nodes run an instance of PACE, with a policy that determines which nodes should be in which sub-cluster. The meta-cluster is the yellow cluster in Figure 15, while the sub-cluster's are red, green, and blue. A plugin would be used by this instance of PACE to configure and launch another instance of pace. This sub- instance would be configured with a specific cluster ID for the sub-cluster. Each sub-cluster instance of pace could then be used to implement any of the above use cases.
Is Policy-based Autonomous Clustering for the Edge (PACE)a replacement for Kubernetes? No. Kubernetes, K3s, Nomad, and similar tools are widely used manageability solutions that provide rich functionality when used as intended. However, these solutions are not always appropriate for the edge, where lower-overhead and autonomy may be required. In addition, when Kubernetes API compatibility and features are required, and autonomy is required (e.g., at a remote edge location), it may be appropriate to use PACE to deploy and manage Kubernetes at the edge.
Does eventual consistency cause unpredictable and/or chaotic behavior? Eventual consistency allows operation in situations where solutions that utilize strict consistency would be unable to
Customer is responsible for safety of the overall system, including compliance with applicable safety-related requirements or standards.
You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein.
The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.