Deep Learning and Reinforcement Learning, Lecture notes of Algorithms and Programming

An introduction to deep learning and reinforcement learning. It compares the approximation of a standard projection with that of a deep learning network. why neural networks are a good solution method in economics and how they can efficiently approximate complex functions. It also discusses AlphaGo and its surprising strategies. a neural network training pipeline and architecture.

Typology: Lecture notes

2021/2022

Uploaded on 05/11/2023

magicphil
magicphil 🇺🇸

4.3

(16)

241 documents

1 / 91

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Deep learning and reinforcement learning
Jes´us Fern´andez-Villaverde1and Galo Nu˜no2
October 15, 2021
1University of Pennsylvania
2Banco de Espa˜na
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b

Partial preview of the text

Download Deep Learning and Reinforcement Learning and more Lecture notes Algorithms and Programming in PDF only on Docsity!

Deep learning and reinforcement learning

Jes´us Fern´andez-Villaverde^1 and Galo Nu˜no^2 October 15, 2021 (^1) University of Pennsylvania

(^2) Banco de Espa˜na

A short introduction

A neural network

  • An artificial neural network (a.k.a. ANN or connectionist system) is an approximation to f (x) built as a linear combination of M generalized linear models of x of the form:

y ∼= g NN^ (x; θ) = θ 0 +

∑^ M

m=

θmφ (zm)

where φ(·) is an arbitrary activation function and:

zm = θ 0 ,m +

∑^ N

n=

θn,mxn

  • M is known as the width of the model.
  • We can select θ such that g NN^ (x; θ) is as close to f (x) as possible given some relevant metric (e.g., L^2 norm).
  • This is known as “training” the network.

Comparison with other approximations

  • Compare: y ∼= g NN^ (x; θ) = θ 0 +

∑^ M

m=

θmφ

θ 0 ,m +

∑^ N

n=

θn,mxn

with a standard projection: y ∼= g CP^ (x; θ) = θ 0 +

∑^ M

m=

θmφm (x)

where φm is, for example, a Chebyshev polynomial.

  • We exchange the rich parameterization of coefficients for the parsimony of basis functions.
  • Later, we will explain why this is often a good idea.
  • How we determine the coefficients will also be different, but this is somewhat less important.

Why are neural networks a good solution method in economics?

  • From now on, I will refer to neural networks as including both single and multilayer networks.
  • With suitable choices of activation functions, neural networks can efficiently approximate extremely complex functions.
  • In particular, under certain (relatively weak) conditions:
    1. Neural networks are universal approximators.
    2. Neural networks break the “curse of dimensionality.”
  • Furthermore, neural networks are easy to code, stable, and scalable for multiprocressing.
  • Thus, neural networks have considerable option value as solution methods in economics.

Current interest

  • Currently, neural networks are among the most active areas of research in computer science and applied math.
  • While original idea goes back to the 1940s, neural networks were rediscovered in the second half of the 2000s.
  • Why?
    1. Suddenly, the large computational and data requirements required to train the networks efficiently became available at a reasonable cost.
    2. New algorithms such as back propagation through gradient descent became popular.
  • Some well-known successes and industrial applications.

AlphaGo

  • Big splash: AlphaGo vs. Lee Sedol in March 2016.
  • Silver et al. (2018): now applied to chess, shogi, Go, and StarCraft II.
  • Check also:
    1. https://deepmind.com/research/alphago/.
    2. https://www.alphagomovie.com/
    3. https: //deepmind.com/blog/article/alphastar-mastering-real-time-strategy-game-starcraft-ii
  • Very different than Deep Blue against Kasparov.
  • New and surprising strategies.
  • However, you need to keep this accomplishment in perspective.

ARTICLERESEARCH

Figure 1 | Neural network training pipeline and architecture. a , A fast rollout policy p π and supervised learning (SL) policy network p σ are trained to predict human expert moves in a data set of positions. A reinforcement learning (RL) policy network p ρ is initialized to the SL policy network, and is then improved by policy gradient learning to maximize the outcome (that is, winning more games) against previous versions of the policy network. A new data set is generated by playing games of self-play with the RL policy network. Finally, a value network v θ is trained by regression to predict the expected outcome (that is, whether

the current player wins) in positions from the self-play data set. b , Schematic representation of the neural network architecture used in AlphaGo. The policy network takes a representation of the board position s as its input, passes it through many convolutional layers with parameters σ (SL policy network) or ρ (RL policy network), and outputs a probability distribution p σ ( | )a s or p ρ ( | )a s over legal moves a, represented by a probability map over the board. The value network similarly uses many convolutional layers with parameters θ , but outputs a scalar value v θ (s′) that predicts the expected outcome in position s′.

Classification Regression

Classification Self Play

Policy gradient

a b

Human expert positions Self-play positions

Neural network

Data

Rollout policy pS pV pU Q (^) T pVU (a⎪s) QT (s′)

SL policy network RL policy network Value network Policy network Value network

s s′

9

Further advantages

  • Neural networks and deep learning often require less “inside knowledge” by experts on the area.
  • Results can be highly counter-intuitive and yet, deliver excellent performance.
  • Outstanding open source libraries: Tensorflow, Pytorch, Flux.
  • More recently, development of dedicated hardware (TPUs, AI accelerators, FPGAs) are likely to maintain a hedge for the area.
  • The width of an ecosystem is key for its long-run success.

Digging deeper

A neuron

  • N observables: x 1 , x 2 ,...,xN. We stack them in x.
  • Coefficients (or weights): θ 0 (a constant), θ 1 , θ 2 , ...,θN. We stack them in θ.
  • We build a linear combination of observations:

z = θ 0 +

∑^ N

n=

θnxn

Theoretically, we could build non-linear combinations, but unlikely to be a fruitful idea in general.

  • We transform such linear combination with an activation function: y = g (x; θ) = φ (z) The activation function might have some coefficients γ on its own.
  • Why do we need an activation function?

Flow representation

Inputs Weights

x 1 θ 1

x 2 θ 2

x 3 θ 3

xn θn

∑^ n

i=

θi xi

Net input

Activation Perceptron

classification

output