Prepara tus exámenes
Consigue puntos
Orientación Universidad
Vende en Docsity
Docsity AI

Prepara tus exámenes

Prepara tus exámenes y mejora tus resultados gracias a la gran cantidad de recursos disponibles en Docsity

Consigue puntos base para descargar

Gana puntos ayudando a otros estudiantes o consíguelos activando un Plan Premium

Orientación Universidad

Vende en Docsity

Docsity AI

Inicia sesión Regístrate

Prepara tus exámenes

Prepara tus exámenes y mejora tus resultados gracias a la gran cantidad de recursos disponibles en Docsity

Busca documentos

Prepara tus exámenes con los documentos que comparten otros estudiantes como tú en Docsity

Busca tu universidad

Encuentra los documentos específicos para los exámenes de tu universidad

Video Cursos

Estudia con lecciones y exámenes resueltos basados en los programas académicos de las mejores universidades

Quiz

Responde a preguntas de exámenes reales y pon a prueba tu preparación

Docsity AINEW

Resume tus documentos, hazles preguntas, conviértelos en quiz y mapas conceptuales

Ver preguntas

Despeja tus dudas leyendo las respuestas a las preguntas que realizaron otros estudiantes como tú

Consigue puntos base para descargar

Gana puntos ayudando a otros estudiantes o consíguelos activando un Plan Premium

Compartir documentos

20 Puntos

Por cada documento subido

Responde a las preguntas

5 Puntos

por cada respuesta dada (máx. 1 al día)

Todos los modos para conseguir puntos gratis

Consigue puntos de inmediato

Elige un plan Premium con todos los puntos que necesitas.

Oportunidades de estudio

Elige tu próximo programa de estudio

Ponte en contacto inmediatamente con las mejores universidades del mundo. Busca entre miles de universidades en todo el mundo. Busca entre miles de universidades partner oficiales

Comunidad

Pregúntale a la comunidad

Pide ayuda a la comunidad y resuelve tus dudas de estudio

Ebooks gratuitos

¡Nuestros e-books salva-estudiantes!

Descarga nuestras guías gratuitas sobre técnicas de estudio, métodos para controlar la ansiedad y consejos para la tesis preparadas por los tutores de Docsity

Spinning Up: Deep Reinforcement Learning, Apuntes de Programación Lineal

UBA XXI Programación Lineal

Esta documentación técnica proporciona una guía completa sobre el aprendizaje por refuerzo profundo (rl) utilizando la biblioteca spinning up. Abarca conceptos clave, algoritmos, implementación práctica y consejos para desarrollar proyectos de investigación en rl. La documentación está diseñada para investigadores y estudiantes que buscan comprender y aplicar técnicas de rl avanzadas.

Tipo: Apuntes

2024/2025

Subido el 26/01/2025

pymesolanas 🇦🇷

2 documentos

1 / 164

Esta página no es visible en la vista previa

¡No te pierdas las partes importantes!

Spinning Up Documentation

Release

Joshua Achiam

Feb 07, 2020

Descubre Apuntes de Programación Lineal UBA XXI

Documentos relacionados

TRABAJO CROSSFIT, SPINNING....

Reinforcement Learning

Spinnig en deportes , como puede llegar a beneficiar el entrenamiento de spinning

Ventajas y desventajas de appeals y frameworks en Spinning FemPower

reinforcement ingles 2 de primaria

(2)

REINFORCEMENT WORKSHEET

THE DISTIctION BETWEEN POSITIVE AND NEGATIVE reinforcement

deep end

Deep Web

Science tercero educación primaria Reinforcement

Información e inglés Negative Reinforcement

English Reinforcement Workbook: 2nd Baccalaureate

(2)

Vista previa parcial del texto

¡Descarga Spinning Up: Deep Reinforcement Learning y más Apuntes en PDF de Programación Lineal solo en Docsity!

Spinning Up Documentation

Release

Joshua Achiam

Feb 07, 2020

1 Introduction
- 1.1 What This Is
- 1.2 Why We Built This
- 1.3 How This Serves Our Mission
- 1.4 Code Design Philosophy
- 1.5 Long-Term Support and Support History
2 Installation
- 2.1 Installing Python
- 2.2 Installing OpenMPI
- 2.3 Installing Spinning Up
- 2.4 Check Your Install
- 2.5 Installing MuJoCo (Optional)
3 Algorithms
- 3.1 What’s Included
- 3.2 Why These Algorithms?
- 3.3 Code Format
4 Running Experiments
- 4.1 Launching from the Command Line
- 4.2 Launching from Scripts
5 Experiment Outputs
- 5.1 Algorithm Outputs
- 5.2 Save Directory Location
- 5.3 Loading and Running Trained Policies
6 Plotting Results
7 Part 1: Key Concepts in RL
- 7.1 What Can RL Do?
- 7.2 Key Concepts and Terminology
- 7.3 (Optional) Formalism
8 Part 2: Kinds of RL Algorithms
- 8.1 A Taxonomy of RL Algorithms
- 8.2 Links to Algorithms in Taxonomy
9 Part 3: Intro to Policy Optimization
- 9.1 Deriving the Simplest Policy Gradient
- 9.2 Implementing the Simplest Policy Gradient
- 9.3 Expected Grad-Log-Prob Lemma
- 9.4 Don’t Let the Past Distract You
- 9.5 Implementing Reward-to-Go Policy Gradient
- 9.6 Baselines in Policy Gradients
- 9.7 Other Forms of the Policy Gradient
- 9.8 Recap
10 Spinning Up as a Deep RL Researcher
- 10.1 The Right Background
- 10.2 Learn by Doing
- 10.3 Developing a Research Project
- 10.4 Doing Rigorous Research in RL
- 10.5 Closing Thoughts
- 10.6 PS: Other Resources
- 10.7 References
11 Key Papers in Deep RL
- 11.1 1. Model-Free RL
- 11.2 2. Exploration
- 11.3 3. Transfer and Multitask RL
- 11.4 4. Hierarchy
- 11.5 5. Memory
- 11.6 6. Model-Based RL
- 11.7 7. Meta-RL
- 11.8 8. Scaling RL
- 11.9 9. RL in the Real World
- 11.10 10. Safety
- 11.11 11. Imitation Learning and Inverse Reinforcement Learning
- 11.12 12. Reproducibility, Analysis, and Critique
- 11.13 13. Bonus: Classic Papers in RL Theory or Review
12 Exercises
- 12.1 Problem Set 1: Basics of Implementation
- 12.2 Problem Set 2: Algorithm Failure Modes
- 12.3 Challenges
13 Benchmarks for Spinning Up Implementations
- 13.1 Performance in Each Environment
- 13.2 Experiment Details
- 13.3 PyTorch vs Tensorflow
14 Vanilla Policy Gradient
- 14.1 Background
- 14.2 Documentation
- 14.3 References
15 Trust Region Policy Optimization
- 15.1 Background
- 15.2 Documentation
- 15.3 References
16 Proximal Policy Optimization
- 16.1 Background
- 16.2 Documentation
- 16.3 References
17 Deep Deterministic Policy Gradient
- 17.1 Background
- 17.2 Documentation
- 17.3 References
18 Twin Delayed DDPG
- 18.1 Background
- 18.2 Documentation
- 18.3 References
19 Soft Actor-Critic
- 19.1 Background
- 19.2 Documentation
- 19.3 References
20 Logger
- 20.1 Using a Logger
- 20.2 Logger Classes
- 20.3 Loading Saved Models (PyTorch Only)
- 20.4 Loading Saved Graphs (Tensorflow Only)
21 Plotter
22 MPI Tools
- 22.1 Core MPI Utilities
- 22.2 MPI + PyTorch Utilities
- 22.3 MPI + Tensorflow Utilities
23 Run Utils
- 23.1 ExperimentGrid
- 23.2 Calling Experiments
24 Acknowledgements
25 About the Author
26 Indices and tables
Python Module Index

Spinning Up Documentation, Release

User Documentation 1

Spinning Up Documentation, Release

2 User Documentation

Spinning Up Documentation, Release

1.2 Why We Built This

One of the single most common questions that we hear is

If I want to contribute to AI safety, how do I get started?

At OpenAI, we believe that deep learning generally—and deep reinforcement learning specifically—will play central roles in the development of powerful AI technology. To ensure that AI is safe, we have to come up with safety strategies and algorithms that are compatible with this paradigm. As a result, we encourage everyone who asks this question to study these fields.

However, while there are many resources to help people quickly ramp up on deep learning, deep reinforcement learning is more challenging to break into. To begin with, a student of deep RL needs to have some background in math, coding, and regular deep learning. Beyond that, they need both a high-level view of the field—an awareness of what topics are studied in it, why they matter, and what’s been done already—and careful instruction on how to connect algorithm theory to algorithm code.

The high-level view is hard to come by because of how new the field is. There is not yet a standard deep RL textbook, so most of the knowledge is locked up in either papers or lecture series, which can take a long time to parse and digest. And learning to implement deep RL algorithms is typically painful, because either

the paper that publishes an algorithm omits or inadvertently obscures key design details,
or widely-public implementations of an algorithm are hard to read, hiding how the code lines up with the algorithm.

While fantastic repos like garage, Baselines, and rllib make it easier for researchers who are already in the field to make progress, they build algorithms into frameworks in ways that involve many non-obvious choices and trade-offs, which makes them hard to learn from. Consequently, the field of deep RL has a pretty high barrier to entry—for new researchers as well as practitioners and hobbyists.

So our package here is designed to serve as the missing middle step for people who are excited by deep RL, and would like to learn how to use it or make a contribution, but don’t have a clear sense of what to study or how to transmute algorithms into code. We’ve tried to make this as helpful a launching point as possible.

That said, practitioners aren’t the only people who can (or should) benefit from these materials. Solving AI safety will require people with a wide range of expertise and perspectives, and many relevant professions have no connection to engineering or computer science at all. Nonetheless, everyone involved will need to learn enough about the technology to make informed decisions, and several pieces of Spinning Up address that need.

1.3 How This Serves Our Mission

OpenAI’s mission is to ensure the safe development of AGI and the broad distribution of benefits from AI more generally. Teaching tools like Spinning Up help us make progress on both of these objectives.

To begin with, we move closer to broad distribution of benefits any time we help people understand what AI is and how it works. This empowers people to think critically about the many issues we anticipate will arise as AI becomes more sophisticated and important in our lives.

Also, critically, we need people to help us work on making sure that AGI is safe. This requires a skill set which is currently in short supply because of how new the field is. We know that many people are interested in helping us, but don’t know how—here is what you should study! If you can become an expert on this material, you can make a difference on AI safety.

4 Chapter 1. Introduction

Spinning Up Documentation, Release

1.4 Code Design Philosophy

The algorithm implementations in the Spinning Up repo are designed to be

as simple as possible while still being reasonably good,
and highly-consistent with each other to expose fundamental similarities between algorithms.

They are almost completely self-contained, with virtually no common code shared between them (except for logging, saving, loading, and MPI utilities), so that an interested person can study each algorithm separately without having to dig through an endless chain of dependencies to see how something is done. The implementations are patterned so that they come as close to pseudocode as possible, to minimize the gap between theory and code.

Importantly, they’re all structured similarly, so if you clearly understand one, jumping into the next is painless.

We tried to minimize the number of tricks used in each algorithm’s implementation, and minimize the differences between otherwise-similar algorithms. To give some examples of removed tricks: we omit regularization terms present in the original Soft-Actor Critic code, as well as observation normalization from all algorithms. For an example of where we’ve removed differences between algorithms: our implementations of DDPG, TD3, and SAC all follow a convention of running gradient descent updates after fixed intervals of environment interaction. (By contrast, other public implementations of these algorithms usually take slightly different approaches from each other, making them a little bit harder to compare.)

All algorithms are “reasonably good” in the sense that they achieve roughly the intended performance, but don’t necessarily match the best reported results in the literature on every task. Consequently, be careful if using any of these implementations for scientific benchmarking comparisons. Details on each implementation’s specific performance level can be found on our benchmarks page.

1.5 Long-Term Support and Support History

Spinning Up is currently in maintenance mode. If there are any breaking bugs, we’ll repair them to ensure that Spinning Up can continue to help people study deep RL.

Support history so far:

Nov 8, 2018: Initial release!
Nov, 2018: Release was followed by a three-week period of high-bandwidth support.
April, 2019: Approximately six months after release, we conducted an internal review of Spinning Up based on feedback from the community. The review surfaced interest in a few key features: - Implementations in Other Neural Network Libraries. Several people expressed interest in seeing Spin- ning Up use alternatives to Tensorflow v1 for the RL implementations. A few members of the community even developed their own PyTorch versions of Spinning Up algorithms, such as Kashif Rasul’s Fired Up, Kai Arulkumaran’s Spinning Up Basic, and Misha Laskin’s Torching Up. As a result, making this kind of “Rosetta Stone” for deep RL became a high priority for future work. - Open Source RL Environments. Many people expressed an interest in seeing Spinning Up use more open source RL environments (eg PyBullet) for benchmarks, examples, and exercises. - More Algorithms. There was some interest in seeing other algorithms included in Spinning Up, especially Deep Q-Networks.
Jan, 2020: The PyTorch update to Spinning Up was released!
Future: No major updates are currently planned for Spinning Up. In the event it makes sense for us to release an additional update, following what we found in the 6-month review, the next-highest priority features are to focus more on open source RL environments and adding algorithms.

1.4. Code Design Philosophy 5

CHAPTER 2 Installation

Table of Contents

Installation
- Installing Python
- Installing OpenMPI

Ubuntu
Mac OS X

Installing Spinning Up
Check Your Install
Installing MuJoCo (Optional)

Spinning Up requires Python3, OpenAI Gym, and OpenMPI.

Spinning Up is currently only supported on Linux and OSX. It may be possible to install on Windows, though this hasn’t been extensively tested.^1

You Should Know

Many examples and benchmarks in Spinning Up refer to RL environments that use the MuJoCo physics engine. MuJoCo is a proprietary software that requires a license, which is free to trial and free for students, but otherwise is not free. As a result, installing it is optional, but because of its importance to the research community—it is the de facto standard for benchmarking deep RL algorithms in continuous control—it is preferred.

Don’t worry if you decide not to install MuJoCo, though. You can definitely get started in RL by running RL algorithms on the Classic Control and Box2d environments in Gym, which are totally free to use.

(^1) It looks like at least one person has figured out a workaround for running on Windows. If you try another way and succeed, please let us know how you did it!

Spinning Up Documentation, Release

2.1 Installing Python

We recommend installing Python through Anaconda. Anaconda is a library that includes Python and many useful packages for Python, as well as an environment manager called conda that makes package management simple.

Follow the installation instructions for Anaconda here. Download and install Anaconda3 (at time of writing, Anaconda3-5.3.0). Then create a conda Python 3.6 env for organizing packages used in Spinning Up:

conda create -n spinningup python=3.

To use Python from the environment you just created, activate the environment with:

conda activate spinningup

You Should Know

If you’re new to python environments and package management, this stuff can quickly get confusing or overwhelming, and you’ll probably hit some snags along the way. (Especially, you should expect problems like, “I just installed this thing, but it says it’s not found when I try to use it!”) You may want to read through some clean explanations about what package management is, why it’s a good idea, and what commands you’ll typically have to execute to correctly use it.

FreeCodeCamp has a good explanation worth reading. There’s a shorter description on Towards Data Science which is also helpful and informative. Finally, if you’re an extremely patient person, you may want to read the (dry, but very informative) documentation page from Conda.

2.2 Installing OpenMPI

2.2.1 Ubuntu

sudo apt-get update && sudo apt-get install libopenmpi-dev

2.2.2 Mac OS X

Installation of system packages on Mac requires Homebrew. With Homebrew installed, run the follwing:

brew install openmpi

2.3 Installing Spinning Up

git clone https://github.com/openai/spinningup.git cd spinningup pip install -e.

You Should Know

8 Chapter 2. Installation

Spinning Up Documentation, Release

10 Chapter 2. Installation

CHAPTER 3 Algorithms

Table of Contents

Algorithms
- What’s Included
- Why These Algorithms?

The On-Policy Algorithms
The Off-Policy Algorithms

Code Format

The Algorithm Function: PyTorch Version
The Algorithm Function: Tensorflow Version
The Core File

3.1 What’s Included

The following algorithms are implemented in the Spinning Up package:

Vanilla Policy Gradient (VPG)
Trust Region Policy Optimization (TRPO)
Proximal Policy Optimization (PPO)
Deep Deterministic Policy Gradient (DDPG)
Twin Delayed DDPG (TD3)
Soft Actor-Critic (SAC)

Spinning Up Documentation, Release

in Gym environments from the command line (though this is not the recommended way to run the algorithms—we’ll describe how to do that on the Running Experiments page).

3.3.1 The Algorithm Function: PyTorch Version

The algorithm function for a PyTorch implementation performs the following tasks in (roughly) this order:

Logger setup
Random seed setting
Environment instantiation
Constructing the actor-critic PyTorch module via the actor_critic function passed to the algorithm function as an argument
Instantiating the experience buffer
Setting up callable loss functions that also provide diagnostics specific to the algorithm
Making PyTorch optimizers
Setting up model saving through the logger
Setting up an update function that runs one epoch of optimization or one step of descent
Running the main loop of the algorithm: (a) Run the agent in the environment (b) Periodically update the parameters of the agent according to the main equations of the algorithm (c) Log key performance metrics and save agent

3.3.2 The Algorithm Function: Tensorflow Version

The algorithm function for a Tensorflow implementation performs the following tasks in (roughly) this order:

Logger setup
Random seed setting
Environment instantiation
Making placeholders for the computation graph
Building the actor-critic computation graph via the actor_critic function passed to the algorithm function as an argument
Instantiating the experience buffer
Building the computation graph for loss functions and diagnostics specific to the algorithm
Making training ops
Making the TF Session and initializing parameters
Setting up model saving through the logger
Defining functions needed for running the main loop of the algorithm (e.g. the core update function, get action function, and test agent function, depending on the algorithm)
Running the main loop of the algorithm: (a) Run the agent in the environment

3.3. Code Format 13

Spinning Up Documentation, Release

(b) Periodically update the parameters of the agent according to the main equations of the algorithm (c) Log key performance metrics and save agent

3.3.3 The Core File

The core files don’t adhere as closely as the algorithms files to a template, but do have some approximate structure:

Tensorflow only: Functions related to making and managing placeholders
Functions for building sections of computation graph relevant to the actor_critic method for a particular algorithm
Any other useful functions
Implementations for an MLP actor-critic compatible with the algorithm, where both the policy and the value function(s) are represented by simple MLPs

14 Chapter 3. Algorithms