Building an Intelligent Agent for Blackjack: Markov Decision Process - Prof. Meinolf Sellm, Study Guides, Projects, Research of Computer Science

In this cs141 assignment, students are required to build an intelligent agent capable of playing blackjack, a card game commonly played in casinos. The agent should formulate the game as a markov decision process (mdp) and solve for an optimal policy. The students will implement a monte-carlo control algorithm to compute an optimal strategy. The assignment involves understanding the rules of blackjack, formulating it as an mdp, implementing the mdp interface, and solving the control problem.

Typology: Study Guides, Projects, Research

Pre 2010

Uploaded on 02/24/2010

koofers-user-ztw
koofers-user-ztw 🇺🇸

10 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS141 Introduction to AI Professor Meinolf Sellmann
Blackjack
Due: 6:00 PM 11/24/09
“You cannot beat a roulette table unless you steal money from it.”
—Albert Einstein
Introduction
Expert game playing is one of the claims to fame of modern AI. But the majority of AI’s pop-
ular success stories involve deterministic games, like chess and checkers. In this assignment,
you will build an intelligent agent that is capable of playing the classic card game of Blackjack,
which, unlike chess and checkers, is rife with randomness. You will achieve this goal by formu-
lating the game as a Markov Decision Process (MDP), and solving for an optimal policy. Your
BlackJackPlayer should even be able to tip the odds in its favor by counting cards.
CS141 Blackjack
Blackjack is card game commonly played in casinos, in which one or more players compete
against a select player called the dealer. The ob ject of the game is achieve a point value as close
to 21 as possible without going over. The CS141 version of Blackjack is a slightly simplified
version of the typical casino game. Each hand proceeds as follows:
1. Each player places his or her bet before the hand begins.
2. The initial cards are dealt to the players:
Each player is dealt two cards face up.
pf3
pf4

Partial preview of the text

Download Building an Intelligent Agent for Blackjack: Markov Decision Process - Prof. Meinolf Sellm and more Study Guides, Projects, Research Computer Science in PDF only on Docsity!

CS141 Introduction to AI Professor Meinolf Sellmann

Blackjack

Due: 6:00 PM 11/24/

“You cannot beat a roulette table unless you steal money from it.” —Albert Einstein

Introduction

Expert game playing is one of the claims to fame of modern AI. But the majority of AI’s pop- ular success stories involve deterministic games, like chess and checkers. In this assignment, you will build an intelligent agent that is capable of playing the classic card game of Blackjack, which, unlike chess and checkers, is rife with randomness. You will achieve this goal by formu- lating the game as a Markov Decision Process (MDP), and solving for an optimal policy. Your BlackJackPlayer should even be able to tip the odds in its favor by counting cards.

CS141 Blackjack

Blackjack is card game commonly played in casinos, in which one or more players compete against a select player called the dealer. The object of the game is achieve a point value as close to 21 as possible without going over. The CS141 version of Blackjack is a slightly simplified version of the typical casino game. Each hand proceeds as follows:

  1. Each player places his or her bet before the hand begins.
  2. The initial cards are dealt to the players:
    • Each player is dealt two cards face up.
  • The dealer is dealt one card face up and one face down, the latter of which is revealed to the players later in the hand.
  1. The cards have the following point values:
  • Cards of rank 2–10 have point values equal to their rank.
  • Picture cards (Kings, Queens, Jacks) have point values equal to 10.
  • Aces have point values equal to 11, unless the player has a score above 21, in which case Aces take on a value of 1.
  1. If any of the players is dealt a natural—two cards that sum to 21—that player need not make any decisions. A natural pays 3:2, unless the dealer is also dealt 21, in which case no money changes hands.
  2. Otherwise, after the initial cards are dealt, the players choose between two different actions:
  • Stay: If a player chooses to Stay, the player’s turn is over.
  • Hit: If a player chooses to hit, the player is dealt another card face up. If the player’s point value exceeds 21 as a result of the newly dealt card, the player busts and loses his or her bet. If the player’s point value is less than or equal to 21, the player again has the option of hitting or staying.
  1. After all the players have either gone bust or opted to stay, the dealer reveals his hidden card, and takes the following actions, deterministically:
  • If his score is ≤ 16, the dealer hits.
  • If his score is ≥ 17, the dealer stays.
  1. If the dealer busts by drawing cards that lead to a point value greater than 21, all the players who have not themselves busted win money equal to their bets, except players with a natural who win money equal to 150% of their bets. If the dealer accumulates a point value between 17 and 21, all the players with point values greater than the dealer win money equal to their bets, except players with a natural who win money equal to 150% of their bets. All the players with scores less than the dealer lose their bets. All the players with scores equal to the dealer push, or tie, and don’t win or lose any money.

Card Counting

In theory, we can assume an infinite deck, which implies that the probability of dealing each possible card is always 521. In practice, however, we play blackjack with a finite set of decks (say 10). Under these circumstances, the probability of dealing a certain card depends on the number

Javadocs for all of the support code are available at http://cs.brown.edu/courses/cs141/blackjack/index.html. Here’s a summary of the important classes available to you.

BlackjackPlayer: Represents a player of blackjack. You must implement methods for betting and getting the next action. The methods for reshuffling and processing cards notify your player about events that happen during the game so that it may count cards.

BlackjackMDP: Represents a Markov Decision Process. You should implement either SARSA or Q-Learning to generate a map from states to values. We recommend implementing the ActionValueMap class and using it to encapsulate all possible values for the actions at a given state. Your player should then call getBestAction to select the best action in the given state.

BlackjackSimulator: Simulates hands of blackjack. This is where the game is played, and this class also contains the main method.

What to Hand In

We need:

  1. A README containing a high level description of your solution, and any unresolved bugs.
  2. All code necessary to run your BlackjackPlayer class.
  3. Your written formulation of Blackjack as an MDP. This formulation can include a figure and/or a mathematical description.

Our handin script turns in everything in the current directory and all subdirectories. To hand in this project, navigate into the directory containing the files and directories you wish to hand in and run cs141-handin blackjack.

Extra Credit

Easy Extend your Blackjack player and the simulator to allow for doubling-down. This means that after looking at only your first two cards, you may double your initial bet and only take one more card, at which point the players turn is over.

Hard Extend your Blackjack player and the simulator to allow for splitting. Splitting is an action that can only be taken after the initial cards are dealt, and only when the two cards have the same rank. The cards are split into two hands and the player must place an additional bet equal in value to their original bet. A new card is dealt to accompany each of the split cards and the two hands are played separately.