



















































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An introduction to deep learning and reinforcement learning. It compares the approximation of a standard projection with that of a deep learning network. why neural networks are a good solution method in economics and how they can efficiently approximate complex functions. It also discusses AlphaGo and its surprising strategies. a neural network training pipeline and architecture.
Typology: Lecture notes
1 / 91
This page cannot be seen from the preview
Don't miss anything!




















































































Jes´us Fern´andez-Villaverde^1 and Galo Nu˜no^2 October 15, 2021 (^1) University of Pennsylvania
(^2) Banco de Espa˜na
y ∼= g NN^ (x; θ) = θ 0 +
m=
θmφ (zm)
where φ(·) is an arbitrary activation function and:
zm = θ 0 ,m +
n=
θn,mxn
m=
θmφ
θ 0 ,m +
n=
θn,mxn
with a standard projection: y ∼= g CP^ (x; θ) = θ 0 +
m=
θmφm (x)
where φm is, for example, a Chebyshev polynomial.
Figure 1 | Neural network training pipeline and architecture. a , A fast rollout policy p π and supervised learning (SL) policy network p σ are trained to predict human expert moves in a data set of positions. A reinforcement learning (RL) policy network p ρ is initialized to the SL policy network, and is then improved by policy gradient learning to maximize the outcome (that is, winning more games) against previous versions of the policy network. A new data set is generated by playing games of self-play with the RL policy network. Finally, a value network v θ is trained by regression to predict the expected outcome (that is, whether
the current player wins) in positions from the self-play data set. b , Schematic representation of the neural network architecture used in AlphaGo. The policy network takes a representation of the board position s as its input, passes it through many convolutional layers with parameters σ (SL policy network) or ρ (RL policy network), and outputs a probability distribution p σ ( | )a s or p ρ ( | )a s over legal moves a, represented by a probability map over the board. The value network similarly uses many convolutional layers with parameters θ , but outputs a scalar value v θ (s′) that predicts the expected outcome in position s′.
Classification Regression
Classification Self Play
Policy gradient
a b
Human expert positions Self-play positions
Neural network
Data
Rollout policy pS pV pU Q (^) T pVU (a⎪s) QT (s′)
SL policy network RL policy network Value network Policy network Value network
s s′
9
z = θ 0 +
n=
θnxn
Theoretically, we could build non-linear combinations, but unlikely to be a fruitful idea in general.
i=