Data Driven Control System Design

Sommario

Motivation ....................................................................................................................................................................................................... 2

Adaptive control ............................................................................................................................................................................................. 3

Introduction .................................................................................................................................................................................................... 4

Uncertain system ........................................................................................................................................................................................ 4

Adaptation ................................................................................................................................................................................................... 4

Exploitation and exploration trade-off .................................................................................................................................................. 5

Taxonomy ................................................................................................................................................................................................... 5

Offline ..................................................................................................................................................................................................... 6

Online ...................................................................................................................................................................................................... 6

Indirect .................................................................................................................................................................................................... 6

Direct ....................................................................................................................................................................................................... 7

Self-tuning ........................................................................................................................................................................................................ 8

Identification algorithm ID ...................................................................................................................................................................... 8

Recursive implementation of the LS (RLS) .................................................................................................................................... 10

What if 𝜃 is time-varying? .................................................................................................................................................................. 11

LS with forgetting ............................................................................................................................................................................... 12

Characterization of the performances .................................................................................................................................................. 14

Performances in self-tuning .............................................................................................................................................................. 15

Virtual Reference Feedback Tuning-VRFT ............................................................................................................................................. 22

𝐻2 norm .................................................................................................................................................................................................... 22

Model reference control problem ......................................................................................................................................................... 23

Virtual reference control problem ........................................................................................................................................................ 23

VRFT cost function ............................................................................................................................................................................ 24

PEM IDENTIFICATION .................................................................................................................................................................... 24

Asymptotic theory of PEM identification ...................................................................................................................................... 26

Frequency interpretation ........................................................................................................................................................................ 26

Open loop ............................................................................................................................................................................................ 26

Closed loop .......................................................................................................................................................................................... 30

VRFT VS MR via PEM frequency interpretation ............................................................................................................................. 32

Naïve approach .................................................................................................................................................................................... 35

VRFT for plants affected by additive disturbance ............................................................................................................................. 36

Instrumental Variable (IV) identification ........................................................................................................................................ 37

Reinforcement learning ............................................................................................................................................................................... 42

MCDP-Markov Chain Decision Process ............................................................................................................................................ 42

Find the optimal policy 𝜋∗ ............................................................................................................................................................... 46

Q-learning ................................................................................................................................................................................................. 49

Q-LEARNING SCHEME ............................................................................................................................................................... 49

Q-LEARNING ALGORITHM ...................................................................................................................................................... 50

CHARACTERIZATION OF CONVERGENCE OF 𝑄𝑡 TO 𝑄∗ ........................................................................................ 50