



























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Class: Computer Science 1: Programming; Subject: Computer Science; University: University of Colorado - Boulder; Term: Unknown 1989;
Typology: Study notes
1 / 35
This page cannot be seen from the preview
Don't miss anything!




























Operating SystemsProgramming LanguagesNetworkingSecurityTheoryArtificial Intelligence
Supervised Learning
spam filters (hotmail.com)ALVINN (autonomous vehicle navigation) Unsupervised Learning
collaborative filtering (amazon.com)fault monitoring Reinforcement Learning
td-gammon (champion backgammon playing program)elevator controlleradaptive home lighting/heating control
Suppose you are in one of two
states
hungrysleepy Suppose you can take one of two
actions
go to Turley’slie on bed Reward contingencies
hungry -> go to Turley’s
reward
hungry -> lie on bed
no reward
sleepy -> go to Turley’s
no reward
sleepy -> lie on bed
reward
Reward depends on what action you take in a given state.
Issues
Delayed reinforcement (e.g., car accident due to worn tires)Occasional reinforcement (e.g., chess playing)Short term versus long term rewards (e.g., skipping class)Exploration versus exploitation (e.g., trying new restaurants)Partially observable state (e.g., viral infection)Multiple agents (e.g., multiple elevators)
s^1
s^2
s^3
s^4
s^5
s^6
s^7
time intervalstateaction instantaneous
1
2
3
4
5
6
7
a^1
a^2
a^3
a^4
a^5
a^6
a^7
r^1
r^2
r^3
r^4
r^5
r^6
r^7
reinforcement
(Watkins, 1989; Watkins & Dayan, 1992)
Q(x,u): If action u is taken in state x, what is the minimumcost we can expect to obtain?Policy based on Q values:Incremental update rule for Q values:Given fully observable state, infinite exploration, etc.,guaranteed to converge on optimal policy.
π^
x ( t
argmin
Q u
x
t^
u
t , (^
with probability
θ
(^
random
with probability
θ
exploration rate
x
t^
u
t , (^
α
(^
x
t^
u
t , (^
α
max
ˆ^ u
c^ t
λ
x
t^
1 +^
ˆ u ,
discount factor
learning rate
The Adaptive House
Michael Mozer
+^ *
Robert Dodier
Debra Miller
Marc Anderson
Josh Anderson
✩^
Diane Lukianow
✩
Dan Bertini
#^
Tom Moyer
Matt Bronder
*^
Charles Myers
✩
Michael Colagrosso
*^
Tom Pennell
Robert Cruickshank
#^
James Ries
✩
Brian Daugherty
*^
Erik Skorpen
✩
Mark Fontenot
^
Joel Sloss
✩
Okechukwu Ikeako
✩^
Lucky Vidmar
Paul Kooros
✩^
Matthew Weeks
✩
University of Colorado
Institute of Cognitive Science
✩ Department of Electrical and Computer Engineering
Department of Mechanical Engineering^ Department of Aerospace Engineering
http://www.cs.colorado.edu/~mozer/adaptive-house
Residence in Marshall, Colorado, outside of Boulder