Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Machine Learning - Special Topics in Computer Science - Assignment 6 | CSCI 4830, Assignments of Computer Science

Material Type: Assignment; Class: Special Topics in Computer Science; Subject: Computer Science; University: University of Colorado - Boulder; Term: Unknown 1989;

Typology: Assignments

Pre 2010

Uploaded on 02/13/2009

koofers-user-hsm
koofers-user-hsm 🇺🇸

5

(2)

10 documents

1 / 2

Toggle sidebar

Related documents


Partial preview of the text

Download Machine Learning - Special Topics in Computer Science - Assignment 6 | CSCI 4830 and more Assignments Computer Science in PDF only on Docsity! CSCI 4830, Sec 007 Professor Mozer Machine Learning Fall 2001Assignment 6 Assigned: Tuesday, November 27, 2001 Due: Tuesday, December 11, 2001 In this assignment, you must program a simple simulated environment and construct a reinforcement learning agent that discovers the optimal (shortest) path through the environment to reach a goal. The agent’s environment will look like: Each cell in this grid is a state in the environment. The cell labeled “I” is the initial state of the agent. The cell labeled “G” is the goal state of the agent. The black cells are barriers—states that are inaccessible to the agent. At each time step, the agent may move one cell to the left, right, up, or down. The environment does not wrap around. Thus, if the agent is in the lower left corner and tries to move down, it will remain in the same cell. Likewise, if the agent is in the initial state and tries to move up (into a barrier), it will remain in the same cell. You should implement a Q learning algorithm that selects moves for the agent. The algorithm should per- form exploration by choosing the action with the maximum Q value 95% of the time, and choosing one of the remaining three actions at random the remaining 5% of the time. The simulation runs until the agent reaches the goal state. The reward at the goal is 0, but at every other state is –1 (because it takes energy for the agent to move). Because this simulation has a finite running time, you do not need to use a discounting factor, i.e., you can set the parameter γ (described in Chapter 13 of the book) to one. Also, because the environment is deterministic, you can set the parameter α (which I will describe in class) to zero, which yields the exactly the algorithm in Table 13.1 of the book. Write up Your write up should include the following: • a graph showing number of steps required to reach the goal as a function of learning trials (one “trial” is one run of the agent through the environment until it reaches the goal). When you make your graph, it will be very noisy if you plot each learning trial separately. Thus, you may want to plot average number of steps over 10-trials blocks. G I 1