Artificial Intelligence - Computer Science I Programming - Slides | CSCI 1300, Study notes of Computer Science

Material Type: Notes; Class: Computer Science 1: Programming; Subject: Computer Science; University: University of Colorado - Boulder; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 02/10/2009

koofers-user-c1w
koofers-user-c1w 🇺🇸

9 documents

1 / 35

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CSCI 1300
Artificial Intelligence Lecture
Mike Mozer
December 4, 2003
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23

Partial preview of the text

Download Artificial Intelligence - Computer Science I Programming - Slides | CSCI 1300 and more Study notes Computer Science in PDF only on Docsity!

CSCI 1300

Artificial Intelligence Lecture

Mike Mozer

December 4, 2003

Computer Science

Operating SystemsProgramming LanguagesNetworkingSecurityTheoryArtificial Intelligence

Machine Learning

Supervised Learning

spam filters (hotmail.com)ALVINN (autonomous vehicle navigation) Unsupervised Learning

collaborative filtering (amazon.com)fault monitoring Reinforcement Learning

td-gammon (champion backgammon playing program)elevator controlleradaptive home lighting/heating control

Reinforcement Learning: A Simple Example

Suppose you are in one of two

states

hungrysleepy Suppose you can take one of two

actions

go to Turley’slie on bed Reward contingencies

hungry -> go to Turley’s

reward

hungry -> lie on bed

no reward

sleepy -> go to Turley’s

no reward

sleepy -> lie on bed

reward

Reward depends on what action you take in a given state.

Reinforcement Learning in the Real World

Issues

Delayed reinforcement (e.g., car accident due to worn tires)Occasional reinforcement (e.g., chess playing)Short term versus long term rewards (e.g., skipping class)Exploration versus exploitation (e.g., trying new restaurants)Partially observable state (e.g., viral infection)Multiple agents (e.g., multiple elevators)

s^1

s^2

s^3

s^4

s^5

s^6

s^7

time intervalstateaction instantaneous

1

2

3

4

5

6

7

a^1

a^2

a^3

a^4

a^5

a^6

a^7

r^1

r^2

r^3

r^4

r^5

r^6

r^7

reinforcement

Elevator Control

Q learning

(Watkins, 1989; Watkins & Dayan, 1992)

Q(x,u): If action u is taken in state x, what is the minimumcost we can expect to obtain?Policy based on Q values:Incremental update rule for Q values:Given fully observable state, infinite exploration, etc.,guaranteed to converge on optimal policy.

π^

x ( t

)^

argmin

Q u

x

t^

u

t , (^

)^

with probability

θ

(^

random

with probability

θ

exploration rate

Q

x

t^

u

t , (^

)^

α

(^

)Q

x

t^

u

t , (^

)^

α

max

ˆ^ u

c^ t

λ

Q

x

t^

1 +^

ˆ u ,

(^

[^

]

discount factor

learning rate

The Adaptive House

Michael Mozer

+^ *

Robert Dodier

Debra Miller

Marc Anderson

Josh Anderson

✩^

Diane Lukianow

Dan Bertini

#^

Tom Moyer



Matt Bronder

*^

Charles Myers

Michael Colagrosso

*^

Tom Pennell

Robert Cruickshank

#^

James Ries

Brian Daugherty

*^

Erik Skorpen

Mark Fontenot

^

Joel Sloss

Okechukwu Ikeako

✩^

Lucky Vidmar

Paul Kooros

✩^

Matthew Weeks

University of Colorado

  • Department of Computer Science^ +

Institute of Cognitive Science

Department of Civil, Environmental, and Architectural Engineering

Department of Electrical and Computer Engineering

 Department of Mechanical Engineering^  Department of Aerospace Engineering

http://www.cs.colorado.edu/~mozer/adaptive-house

The adaptive house

Residence in Marshall, Colorado, outside of Boulder

Some of the gang

Bedrooms and bathrooms

Sensors

Water heater

Furnace