Docsity
Docsity

Prepara i tuoi esami
Prepara i tuoi esami

Studia grazie alle numerose risorse presenti su Docsity


Ottieni i punti per scaricare
Ottieni i punti per scaricare

Guadagna punti aiutando altri studenti oppure acquistali con un piano Premium


Guide e consigli
Guide e consigli


Data Driven Control System Design - Completo (DDCSD), Appunti di Controllo adattativo

Appunti completi e riordinati del corso "Data Driven" tenuto dal professor Garatti presso il Politecnico di Milano. Il corso introduce l'idea dell'adaptive control e implementa 3 metodi diversi: self-tuning, VRFT e Q-learning. Gli appunti in alcune parti ripetono passaggi simili ma sono estremamente utili per comprendere meglio quello che succede. Spero possano tornare utili a chi li acquisterà!

Tipologia: Appunti

2024/2025

In vendita dal 03/02/2025

luca-varotto-1
luca-varotto-1 🇮🇹

12 documenti

1 / 51

Toggle sidebar

Questa pagina non è visibile nell’anteprima

Non perderti parti importanti!

bg1
Sommario
Motivation ....................................................................................................................................................................................................... 2
Adaptive control ............................................................................................................................................................................................. 3
Introduction .................................................................................................................................................................................................... 4
Uncertain system ........................................................................................................................................................................................ 4
Adaptation ................................................................................................................................................................................................... 4
Exploitation and exploration trade-off .................................................................................................................................................. 5
Taxonomy ................................................................................................................................................................................................... 5
Offline ..................................................................................................................................................................................................... 6
Online ...................................................................................................................................................................................................... 6
Indirect .................................................................................................................................................................................................... 6
Direct ....................................................................................................................................................................................................... 7
Self-tuning ........................................................................................................................................................................................................ 8
Identification algorithm ID ...................................................................................................................................................................... 8
Recursive implementation of the LS (RLS) .................................................................................................................................... 10
What if 𝜃 is time-varying? .................................................................................................................................................................. 11
LS with forgetting ............................................................................................................................................................................... 12
Characterization of the performances .................................................................................................................................................. 14
Performances in self-tuning .............................................................................................................................................................. 15
Virtual Reference Feedback Tuning-VRFT ............................................................................................................................................. 22
𝐻2 norm .................................................................................................................................................................................................... 22
Model reference control problem ......................................................................................................................................................... 23
Virtual reference control problem ........................................................................................................................................................ 23
VRFT cost function ............................................................................................................................................................................ 24
PEM IDENTIFICATION .................................................................................................................................................................... 24
Asymptotic theory of PEM identification ...................................................................................................................................... 26
Frequency interpretation ........................................................................................................................................................................ 26
Open loop ............................................................................................................................................................................................ 26
Closed loop .......................................................................................................................................................................................... 30
VRFT VS MR via PEM frequency interpretation ............................................................................................................................. 32
Naïve approach .................................................................................................................................................................................... 35
VRFT for plants affected by additive disturbance ............................................................................................................................. 36
Instrumental Variable (IV) identification ........................................................................................................................................ 37
Reinforcement learning ............................................................................................................................................................................... 42
MCDP-Markov Chain Decision Process ............................................................................................................................................ 42
Find the optimal policy 𝜋 ............................................................................................................................................................... 46
Q-learning ................................................................................................................................................................................................. 49
Q-LEARNING SCHEME ............................................................................................................................................................... 49
Q-LEARNING ALGORITHM ...................................................................................................................................................... 50
CHARACTERIZATION OF CONVERGENCE OF 𝑄𝑡 TO 𝑄 ........................................................................................ 50
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33

Anteprima parziale del testo

Scarica Data Driven Control System Design - Completo (DDCSD) e più Appunti in PDF di Controllo adattativo solo su Docsity!

Sommario

  • Motivation
  • Adaptive control
  • Introduction
    • Uncertain system
    • Adaptation
    • Exploitation and exploration trade-off
    • Taxonomy
      • Offline
      • Online
      • Indirect
      • Direct.......................................................................................................................................................................................................
  • Self-tuning
    • Identification algorithm ID
      • Recursive implementation of the LS (RLS)
      • What if 𝜃 is time-varying?..................................................................................................................................................................
      • LS with forgetting
    • Characterization of the performances
      • Performances in self-tuning
  • Virtual Reference Feedback Tuning-VRFT.............................................................................................................................................
    • 𝐻 2 norm
    • Model reference control problem
    • Virtual reference control problem
      • VRFT cost function
    • PEM IDENTIFICATION
      • Asymptotic theory of PEM identification
    • Frequency interpretation
      • Open loop
      • Closed loop
    • VRFT VS MR via PEM frequency interpretation
      • Naïve approach....................................................................................................................................................................................
    • VRFT for plants affected by additive disturbance
      • Instrumental Variable (IV) identification
  • Reinforcement learning
    • MCDP-Markov Chain Decision Process
      • Find the optimal policy 𝜋 ∗
    • Q-learning
      • Q-LEARNING SCHEME
      • Q-LEARNING ALGORITHM
      • CHARACTERIZATION OF CONVERGENCE OF 𝑄𝑡 TO 𝑄 ∗

Data Driven Control System Design

The aim of the course is to learn some methods to use data to design a controller able to work in uncertain, time-varying

environments robustly.

Motivation

We start with an example: Given a tank, we will study its behaviour in case of uncertainty of parameters.

is the volume of water at time 𝑡

 𝑢(𝑡) is the incoming flow of water

 𝑝(𝑡) is the outgoing flow of water

, 𝛼 ∈ [0,1]

Equation of the system:

௧ାଵ

We can rewrite it substituting 𝑉

and

The control goal is to follow a given setpoint 𝑟

optimal control law: 𝑢(𝑡) = −𝑎 ∙ 𝑥(𝑡) + 𝑟(𝑡)In this way: 𝑥(𝑡 + 1) = 𝑟(𝑡)

What does it change if a is not well-known?

In this case 𝑎 ∈ [𝛽, 𝛾]

To obtain an optimal control law we need the exact value of 𝑎 but this is not possible. We can use a guess of 𝑎:

The system becomes:

At equilibrium:

Based on the value of 𝑎

we can have an overshoot (𝑎

< 𝑎) or an undershoot (𝑎

To add robustness at the control law we add an integrator

But it has not a good transient. Whatever 𝑘 we will choose the system always has a pole with magnitude

Integrator control scheme has always a slow dynamic irrespective of 𝑘.

With this example we demonstrate that linear robust control can work but is very conservative. It can work only in the cases

where we do not have strictly requirements.

𝑝(𝑡)+leakage

Introduction

The system 𝑆 is a mathematical description of a part of reality. It links some quantities through some relations.

These quantities are called:

 Dependent: their values depend on other quantities (e.g. state and output)

o Observable

o Non-observable (hidden)

 Independent: their values do not depend on other quantities (e.g. input)

o Tuneable (as input)

o Exogenous (ad noise or disturbance)

 Known

 Unknown

Uncertain system

An uncertain system needs a robust control.

The control problem is to decide the values of the tuneable quantities based on the observations of the observable

dependent quantities so that their behaviour meets some design goals irrespective of the values taken by the independent

uncertain quantities.

𝑠 represents an instance of the system for a given value of the uncertain quantities.

𝑠 ∈ 𝑆 →Class of systems for different instances of the uncertain quantities.

Same for controller: 𝑐 ∈ 𝐶 (𝑐 instance of the policy and 𝐶 class of controllers, for example PID)

Index of performances:

It evaluates the behaviour of the dependent quantities when 𝑠 is operating according to 𝑐.

Example:

max|𝑒𝑖𝑔|

We can redefine the control problem as:

 Find 𝑐 ∈ 𝐶: 𝐽(𝑐, 𝑠) ≤ 𝑘 ∀ 𝑠 ∈ 𝑆

 Find a controller that premises to achieve all the control goals

Two cases:

  1. There exists a control law 𝑐 ∈ 𝐶: 𝐽(𝑐, 𝑠) ≤ 𝑘 ∀ 𝑠 ∈ 𝑆

 This is a lucky situation

 In this case a classic control is sufficient (no need of adaptation or data-driven)

 Design goals not too strict and small uncertainties (additive and parametric uncertainties)

 None of the admissible policies is a solution for the robust problem

 We must change our admissible policy: 𝐶 → 𝐶

→ Enlarged class of policies for which a solution can be

found.

 Tricky, 𝐶′ cannot be too big otherwise we are not able to operate with it.

Adaptation

It is a general method to find a proper enlarged policy class 𝐶′ so that a solution to the robust problem can be found.

To work we need that:

At least a control capable to solve the system

Adaptation rule: we implement a policy 𝑐 ∈ 𝐶 that is not fixed. An additional layer, the tuner, modifies the control law based

on the information that is gathered from the system.

The pro is that follows a rule that is built on our knowledge.

The con is that works as second feedback𝐶′ is more complex and difficult to study (𝐶′ policy is to use policy 𝐶 but

premising the switching between control laws)

Remark:

 Adaptation and data-driven CSD (Control System Design) is not related only to t-varying systems

Obviously, it helps a lot for t-varying systems, but it is not used only for this.

 A t-varying system does not necessarily need an adaptive control

o Example

𝑎 = 0.9 + 0.05 ∙ sin

The system is time-varying but well-known

Optimal control law: 𝑢(𝑡) = −𝑎 ∙ 𝑥(𝑡) + 𝑟(𝑡)

Adaptive control is used in case of:

 t-invariant systems with uncertainties (if strictly requirements)

 t-varying systems with uncertainties (in 99% of the cases. Difficult to achieve robust control with traditional method)

Exploitation and exploration trade-off

The additional feedback leads, in adaptive control schemes, a trade-off between exploitation and exploration.

The exploitation is called also primal effect and indicates that the behaviour of 𝑆 depends on the tuneable parameter (input).

The exploration is called also dual effect and indicates that the choice of the input influences the estimated model 𝑆

. More

the system is excited the better the model 𝑆

. (But the aim of the controller is not to excite the system but to keep it at

steady-state).

Therefore, exploitation and exploration are:

 Opposing:

o ExploitationKeep 𝑆 at steady state

o ExplorationExcite 𝑆 to improve the model 𝑆

 Intertwined:

o Input impacts on 𝑆

and 𝑆

on input.

An adaptive scheme automatically determines a trade-off between them. The optimal trade-off is not obvious.

Taxonomy

Offline Online

Indirect Basic identification control design Self-Tuning

Direct VRFT

Reinforcement learning

Q-learning

S

Adaptive control

Tuner

  1. Robust adaptive control

 The model 𝑆 is not completely trusted. 𝑢(𝑡) is decided accounting for an estimation of 𝑆 − 𝑆

(model

mismatch)

 we remain cautious and avoid trusting 𝑆

too much so that the real system 𝑆 is not brought in operating

conditions which are more difficult to recover when we recognize that the system has not behaved as

expected.

Direct

In indirect control approach we develop the controller based on 𝑆

This controller is good if 𝐽(𝐶, 𝑆

) ≈ 𝐽(𝐶, 𝑆) (Indicators of performance are equals)

But instilling this requirement is very difficult.

Direct approach jumps the identification of the system 𝑆

and goes directly to an indicator of performance starting from data

We use the data to find an estimate of 𝐽(𝐶, 𝑆) directly.

Example

𝑓(𝑥) is an unknown real function.

, 𝑥 ∈ [0,1]

The objective is to evaluate ∫ 𝑓

I measure {𝑥 ଵ

Indirect approach:

(𝑥) = min

௜ୀଵ

is the evaluation for ∫

Analysing the difference:

The error remains big for large 𝑁 when 𝑑 is high

Direct approach:

𝐸[𝑓(𝑥)] =

௜ୀଵ

Analysing the difference:

௜ୀଵ

The error is independent to 𝑑.

To have an error smaller than 𝜖 (e.g. 𝜖 = 0.1 and 𝑑 = 6) we need:

st

case:

nd

case:

Self-tuning

Online indirect method with sub-tuner that follows the CEP logic.

The main elements are:

 Uncertain system 𝑆

 Identification algorithm ID

 The controller 𝐶

It is a linear time-varying stochastic (SISO) system

 𝑦(𝑡): outputdependent observable variable

 𝑢(𝑡): inputIndependent tuneable variable

 𝑒(𝑡): additive disturbance 𝑒(𝑡)~𝑊𝑁(0, 𝜆

)uncertain exogenous independent quantity

 𝑦( 0 ), … , 𝑦(−𝑚 + 1): initial state uncertain exogenous independent quantity

Robustness to these uncertainties is achievable also by means of standard control methodsAdaptation not

necessary

: parameters uncertain t-varying exogenous independent quantity

Difficult to address with standard control techniquesCalled for adaptation

We have that 𝑏 ଴

(𝑡) ≠ 0 ∀ 𝑡 and 𝑑 ≥ 1 (𝑑 is the delay, strictly proper system)

Parameter vector:

(𝑡) = [𝑎

)]

Vector of regression variables:

[
𝑦(𝑡 − 𝑛) 𝑢(𝑡 − 𝑑) ⋯ 𝑢(𝑡 − 𝑑 − 𝑚)]

We can rewrite the system in compact form:

ି ଵ

ି ଵ

(For hypothesis 𝐴(𝑧

ି ଵ

(𝑡)) and 𝐵(𝑧

ି ଵ

(𝑡)) are coprimeNo cancellation between numerator and denominator, in

this way we avoid non-observable or non-reachable parts, and the control problem is well-defined)

Identification algorithm ID

𝑆 is a t-varying ARX (Autoregressive with exogenous input).

We can use a PEM (Prediction Error Minimization) for the identification of the model. One possible implementation is the

Least Square Identification. (It fits very well ARX models)

Therefore, we search the model to fit 𝑆 in the ARX model class.

(same model as before)

We assume to know the order of 𝑆𝑛, 𝑚 and 𝑑 known.

We assume that 𝜃 is constant

We can rewrite the model in compact form:

At time 𝑡 − 1, 𝜓(𝑡 − 1)

is known

A prediction of 𝑦(𝑡) given the information at 𝑡 − 1 is:

ID

= argmin

௜ୀଵ

௜ୀଵ

௜ୀଵ

ே ்

௜ୀଵ

௜ୀଵ

ே ்

௜ୀଵ

௜ୀଵ

ି ଵ

௜ୀଵ

ே ்

௜ୀଵ

ି ଵ

௜ୀଵ

This is the identification block in self-tuning.

This implementation of the LS (least square) is trivial:

 Computationally expensiveInverse matrix

 Numerical instability

 A lot of memory is allocated (𝑢( 1 ), … , 𝑢(𝑡), 𝑦( 1 ), … , 𝑦(𝑡))

Recursive implementation of the LS (RLS)

Two states:

௧ ்

௜ୀଵ

ି ଵ

௜ୀଵ

 Information matrix: 𝑆(𝑡) = 𝑆

௧ ்

௜ୀଵ

ି ଵ

௜ୀଵ

𝑆(𝑡) = 𝑆

̅

  • ෍ 𝜓(𝑖 − 1) ∙ 𝜓(𝑖 − 1)

௜ୀଵ

= 𝑆

̅

  • ෍ 𝜓(𝑖 − 1) ∙ 𝜓(𝑖 − 1)

  • 𝜓(𝑡 − 1) ∙ 𝜓(𝑡 − 1)

=

௧ିଵ

௜ୀଵ

𝑆(𝑡 − 1) + 𝜓(𝑡 − 1) ∙ 𝜓(𝑡 − 1)

𝑆(𝑡 + 1) = 𝑆(𝑡) + 𝜓(𝑡) ∙ 𝜓(𝑡)

𝑆

( 𝑡 + 1

) ∙ 𝜃

𝑡+

( 𝑡 + 1

) ∙ 𝑆

( 𝑡 + 1

)

( 𝑖 − 1

) ∙ 𝑦

( 𝑖

)

𝑡+

𝑖=

𝑡+

𝑡+

𝑖=

𝑡

𝑖=

𝑡

௧ାଵ

௧ାଵ

𝑇

௧ାଵ

𝑇

Where:

𝑆(𝑡 + 1)

ିଵ

𝑡

൯ is the innovation to be added to 𝜃

𝑺(𝒕 + 𝟏)

ି 𝟏

∙ 𝝍(𝒕): is the weight of the information at 𝑡 + 1

𝑻

𝒕

: is the prediction error for 𝑦(𝑡 + 1) made at time 𝑡 with the

best current estimate of the parameter vector.

Initialization: 𝜃

RLS-I form

௧ାଵ

  • 𝑆

𝑡 + 1

∙ 𝜓

𝑡

𝑡

𝑇

𝑇

Pro:

 𝜃

returned by recursive least squareExact least square without approximation

Con:

 Not practical:

o Computational demandinginverse of a matrix: 𝑆(𝑡 + 1)

ିଵ

o Numerical unstable 𝑆(𝑡 + 1) = 𝑆(𝑡) + 𝜓(𝑡) ∙ 𝜓(𝑡)

diverge asymptotically

Matrix inversion lemma

Let 𝐹, 𝐺, 𝐻 and 𝐾 be matrices with suitable dimensions s.t.:

 𝐹, 𝐻 and 𝐹 + 𝐺 ∙ 𝐻 ∙ 𝐾 are invertible

Then:

ି ଵ

ି ଵ

ି ଵ

ି ଵ

ି ଵ

ି ଵ

ି ଵ

RLS-III form

It cures both the problems.

The idea is that instead of computing the inverse of 𝑆(𝑡 + 1)

ିଵ

we can deal directly with the inverse:

ି ଵ

And we can find a formula to compute 𝑉(𝑡 + 1) from 𝑉(𝑡) and 𝜓(𝑡)This is not computational demanding.

ି ଵ

𝑇

ି ଵ

And using the lemma we obtain:

𝑇

ଵି

𝑇

௧ାଵ

𝑇

𝑉

𝑡

∙ 𝜓

𝑡

𝑡

𝑇

𝑡

𝑇

ି ଵ

 Computationally inexpensive (inverse of a scalar ൫1 + 𝜓(𝑡)

𝑇

ିଵ

(The only hard inverse is 𝑆

ି ଵ

 Numerical stable 𝑉(𝑡) does not diverge

What if 𝜃 is time-varying?

For simplicity we use the I form:

௧ାଵ

  • 𝑆

𝑡 + 1

∙ 𝜓

𝑡

𝑡

𝑇

𝑇

𝜓(𝑡) ∙ 𝜓(𝑡)

≥ 0 → 𝑆(𝑡 + 1) = 𝑆(𝑡) + 𝜓(𝑡) ∙ 𝜓(𝑡)

diverges

Since this 𝜃

௧ାଵ

𝑇

Also the prediction of the parameters convergeIf 𝜃 changes the algorithm is not able to update the value of 𝜃

The identification is done through:

௜ୀଵ

 Every prediction error has always the same weight 𝑆

Issue for time-varying parameter

is built on data that refers to too much old information

௧ାଵ

ି ଵ

௧ାଵ

ି ଵ

With 𝜇 < 1, 𝑆(𝑡) evolves as an asymptotic stable 1

st

order system (it does not diverge) even if 𝜓(𝑡) ∙ 𝜓(𝑡)

is strictly positive.

ି ଵ

does not go to zero and 𝜃

is always reactive to new data.

Issues with forgetting factor

We can have a singularity issue known with the name of blow-up phenomenon.

With 𝜇 < 1:

and if 𝜓(𝑡) is not informative (e.g. 𝜓(𝑡) = [0 0 ⋯ 0]) then 𝑆(𝑡 + 1) → 0

This initially is not a big problem since the prediction vector is:

௧ାଵ

ି ଵ

And the product 𝑆(𝑡 + 1)

ି ଵ

∙ 𝜓(𝑡) is:

 Big ∙ 0 = 0𝜃

௧ାଵ

The problem arises when the system returns informative 𝜓(𝑡) ≠ 0 and the product 𝑆(𝑡 + 1)

ି ଵ

∙ 𝜓(𝑡) is:

 Big ∙ (≠ 0) ≠ 0 and its value is very high𝑆(𝑡 + 1)

ି ଵ

௧ାଵ

changes suddenly and strongly.

Example

This behaviour is not good when identification is used in a control scheme.

I said that this can occur every time the 𝜓(𝑡) becomes not informative, not necessarily equal to 0 as in the example.

If 𝜓(𝑡) = [1 1 ⋯ 1]𝐷𝑒𝑡൫𝑆(𝑡)൯ = 0 approaches singularity and 𝑆(𝑡)

ି ଵ

divergesBlow up phenomenon

We can prevent it looking if 𝑆(𝑡) is approaching singularity (𝑆(𝑡) is observable). And if it is near a singularity, we change the

adopted strategy for the forgetting factor.

We will look to the value of 𝑟𝑐𝑜𝑛𝑑൫𝑆(𝑡)൯ (condition number:

௦௠௔௟௟௘௥ ௘௜௚

௛௜௚௛௘ ௘௜௚

ି ଵ

Burst

In this way 𝑆(𝑡)

ି ଵ

doesn’t diverge and we avoid the burst.

Characterization of the performances

Given a general self-tuning scheme:

൯ is a basic controller𝐶൫𝜃

𝐶(𝜃) is any linear controller whose parameters are functions of 𝜃:

𝑢(𝑡) = 𝛼 ଵ

(𝜃) ∙ 𝑢(𝑡 − 1) + ⋯ + 𝛼 ௡

(𝜃) ∙ 𝑢(𝑡 − 𝑛 ఈ

) + 𝛽 ଴

(𝜃) ∙ 𝑦(𝑡) + ⋯ + 𝛽 ௡

(𝜃) ∙ 𝑦൫𝑡 − 𝑛 ఉ

൯ + 𝛾 ଴

(𝜃) ∙ 𝑟(𝑡) + ⋯ + 𝛾 ௡

(𝜃) ∙ 𝑟൫𝑡 − 𝑛 ఊ

In compact form:

The specific parameters 𝛼 ௜

(𝜃) and 𝛾

(𝜃) depend on the chosen control technique, which can be chosen depending

on the application.

The analysis of the self-tuning will be conditional to the chosen control technique (=self-tuning can be as good as what can

be achieved by the chosen control scheme (PID, PI, etc…)).

is a PIDSelf-tuning performance will be as that of a PID

 𝐶(𝜃) is an LQGSelf-tuning performance will be comparable to the one of a LQG scheme

There is no need to specify the controller class completely since these results are general.

Given a scheme as:

We call Σ(𝜃, 𝜃) the set of standard linear control schemes in which we connect the controller 𝐶(𝜃) with the model 𝑀(𝜃)

Where satisfactory means that with the chosen controller 𝐶(𝜃) the control specifications are met (design goals).

If 𝐶(𝜃) is badly chosen with respect to design goals, then the set Ξ will be small.

Obviously Ξ cannot be as large as the domain since there exist intrinsic limitations as observability and reachability.

ି ଵ

ID

Convergence of RLS

Irrespective of the experiments, 𝜃

(𝑡) converges to some finite vector 𝜃

Always ∃ 𝜃

Demonstration

௧ାଵ

ି ଵ

௧ାଵ

ି ଵ

Defining:

: LS estimation error

௧ାଵ

ି ଵ

Multiplying for 𝑆(𝑡 + 1):

௧ାଵ

And since:

௧ାଵ

This is an invariant of the LS algorithm:

௧ାଵ

ି ଵ

Convergence: 𝜃

ି ଵ

But we yet know that 𝑆

ି ଵ

converges (since 𝑆

diverges)

Demonstration of the convergence of 𝑆(𝑡)

ିଵ

Positive definite

“A square and symmetric matric Γ is said positive definite if 𝑋

Given two symmetric, squared matrices 𝑀 and 𝑁 with same dimensionality, we say that 𝑀 ≥ 𝑁 if 𝑀 − 𝑁 ≥ 0

(and 𝑋

ି ଵ

ି ଵ

ି ଵ

Decreasing function

𝑆(𝑡)

ିଵ

≥ 𝑆(𝑡 + 1)

ିଵ

↔ 𝑋

∙ 𝑆(𝑡)

ିଵ

∙ 𝑋 ≥ 𝑋

∙ 𝑆(𝑡 + 1)

ିଵ

∙ 𝑋

𝑋

∙ 𝑆

̅

ି ଵ

∙ 𝑋 ≥ ⋯ ≥ 𝑋

∙ 𝑆(𝑡 + 1)

ିଵ

∙ 𝑋 ≥ 0

Therefore 𝑋

ି ଵ

∙ 𝑋 converges asymptotically

ି ଵ

ଵଵ

ଵ௞

௞ଵ

௞௞

Choosing 𝑋 = [1 0 … 0]: 𝑋

ି ଵ

ଵଵ

ଵଵ

Choosing 𝑋 = [0 1 … 0]: 𝑋

ି ଵ

ଶଶ

ଶଶ

Each 𝑉 ௜௜

௜௜

Then choosing 𝑋 =

[
]

ି ଵ

ଵଵ

ଶଶ

ଵଶ

ଵଵ

ଵଵ

ଶଶ

ଶଶ

ଵଶ

ଵଶ

Each 𝑉

௜௝

௜௝

ି ଵ

௜௝

௜௝

Convergence of 𝜃

IF 𝑆(𝑡)

ି ଵ

IF 𝑆(𝑡)

ି ଵ

In adaptive control 𝑽 ஶ

= 𝟎 cannot be guaranteed!

Results for self-tuning

We would like now to prove that Σ൫𝜃

൯ behaves like a linear control scheme with no uncertainty Σ(𝜃, 𝜃) irrespective of

the fact that 𝑆 is unknown.

Naïve expectation:

 Self-tuning behaves as Σ(𝜃

o Best possible resultSelf-tuning behaves as control scheme we would design if system was known

o Impossible result in general

Since 𝜃

we may instead expect that self-tuning behaves like Σ൫𝜃

I/O of self-tuning:

௧ାଵ

ି ଵ

Input: 𝑟(𝑡)

Output: 𝑧(𝑡) = ൤

We have yet seen that 𝜃

and 𝑆(𝑡)

ି ଵ

converge. Then we are interested in 𝑧(𝑡).

Imaginary control scheme Σ൫𝜃

is the estimate obtained in the self-tuning scheme when

RLS is fed by 𝑦(𝑡) and 𝑢(𝑡) and NOT 𝑢

is seen as an exogenous input in this scheme and does

not depend on 𝑢

(𝑡) and 𝑢(𝑡) ≠ 𝑢

൯ is a linear time varying systemቊ

This system does not exist in practise. We can simulate it and use it as comparing term for self-tuning.

There are 2 reasons why 𝜃

∈ Ξ robustly:

depends on:

a. 𝜃

b. Dataset 𝑢(𝑖), 𝑦(𝑖)

c. 𝜃

and 𝑆

initialization of RLS

For those basic controllers for which Ξ

is small enough (e.g. in pole placement Ξ

is a linear subspace so its

dimensionality is lower than the 𝜃 domain), if 𝜃

is chosen with a perturbation, then 𝑃൫𝜃

If we find a 𝜃

which leads to Ξ

→ then we can perturb a bit one

of the dimensions of 𝜃

and 𝜃

will NOT tend Ξ

  1. Ξ is user chosen (it depends on the chosen basic controller, and it can be computed beforehand), so it is possible

to edit the RLS algorithm so that as soon as we see that 𝜃

is approaching Ξ

we can drift the estimate away from

without altering the properties of the algorithm. Therefore 𝜃

∈ Ξ is always guaranteed.

Proof of the theorem “𝜃

൯ is self-optimal”

Before starting we do some observations/simplifications:

→A.S. is always a requirement

∈ Ξ, therefore, there is an instant 𝑡

after which 𝜃

∈ Ξ ∀t ≥ t

To simplify the discussion, we assume that 𝜃

∈ Ξ ∀t

→No conceptual changes.

ି ଵ

ି ௡

ି ଵ

ି ௠

= [𝑦(𝑡 − 1) ⋯ 𝑦(𝑡 − 𝑛) 𝑢(𝑡 − 𝑑) ⋯ 𝑢(𝑡 − 𝑑 − 𝑚) ]

The system 𝑆 is: 𝑦

= [𝑦(𝑡 − 1) ⋯ 𝑦(𝑡 − 𝑛) 𝑢(𝑡 − 𝑑) ⋯ 𝑢(𝑡 − 𝑑 − 𝑚)] ∙ ൣ𝑎ො

ଵ,௧

௡,௧

଴,௧

௠,௧

Imaginary control scheme

Additional input 𝑒(𝑡):

 Non-linear feedback

The whole proof of the theorem can be now split in three steps:

൯ is asymptotically stable also if is t-varying

The system can be written as:

: state

𝑟(𝑡): input

(𝑡): output

The system is A.S. if ‖𝜉(𝑡)‖ → 0 ↔ 𝐹 ൫

is Hurwitz stable ቚ

We cannot say that if 𝜃

This is not valid because we are changing the parameters over time.

Example:

Both the systems are A.S.

But if we combine the dynamics:

The overall system is not A.S. 𝐹(𝑡) Hurwitz ∀𝑡 is not enough to have A.S.

We can instead evaluate Σ൫𝜃

൯: In this case, since the system is LTI, for A.S. is sufficient that 𝐹൫𝜃

൯ is

Hurwitz stable

If 𝜃

is asymptotically stable if:

Intuitively this is because from a certain moment Σ൫𝜃

൯ is not switching anymore

  1. Study of the perturbation of 𝑒(𝑡):

We will see that 𝑒(𝑡) → 0 always. It is asymptotically vanishing irrespective of the operating conditions.

ି ଵ

ି ଵ

By analysing our feedback, we can prove that 𝜓(𝑡) is bounded ∀𝑡

𝜉