






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The Monty Hall Problem, covered through the lens of randomised trials in R
Typology: Study Guides, Projects, Research
1 / 12
This page cannot be seen from the preview
Don't miss anything!







knitr::opts_chunk$set(echo = TRUE) set.seed(666) library(knitr) ## for R Markdown library(modeest) ## for mode library(moments) ## for skew and kurtosis
library(SimMultiCorrData) ## for generating non-normal distributions ## nonnormvar
library(ggplot2) ## for plotting library(dplyr) ## for pipes
library(dplyr) ## for pipes
options(dplyr.summarise.inform = FALSE)
Read the data from the file stored in “./_data/montyHall_Initial.csv” Ensure you read character variables as factors Call the data.frame : mData
mData = read.csv("./_data/montyHall_Initial.csv")
Plot this data using ggplot: Use a stacked bar plot with the x axis plotting the strategy and the y axis counts. Stack the wins and loss counts on top of each other in the bar plot Hint: You will first have to make a grouped data.frame as we did in our earlier exercises # Group the data by strategy and outcome (win/loss) grouped_mData <- mData %>% group_by(strategy, result) %>% summarise(count = n())
# Plot the data using ggplot (Plot1 = ggplot(grouped_mData, aes(x = strategy, y = count, fill = result)) + geom_bar(stat = "identity") + labs(x = "Strategy", y = "Counts", title = "Monty Hall Problem - Initial Outcomes") + theme_minimal())
0
1
2
3
4
5
stay switch
lose win
Monty Hall Problem − Initial Outcomes
What is the observed probability of winning if you switch: pSwitch.Obs = 3/5 What is the observed probability of winning if you stay : pStay.Obs = 2/
What is the expected probability of winning if you switch: pSwitch.Exp = 2/3 What is the expected probability of winning if you stay : pStay.Exp = 1/
Why are the observed and expected different?
We would need more trials! Sample size is too small.
We want to evaluate if the two strategies give a significantly different chance of winning.
What is our NULL hypothesis with respect to the probability of winning?
temp2 = rData$strategy == "switch" temp2 = rData[temp2,] (temp2 = (temp2$result == "win"))
(pSwitch.ran = sum(temp2)/5)
(NullDiff = pSwitch.ran - pStay.ran)
The above is a single draw from the null distribution. How will you create a more complete distribution of null values? Save this complete distribution of null values in a variable called nullDiff.Vec
*** REMEMBER the first value of the nullDiff.Vec has to be the observed value: WHY??
This is to make sure that the observed value is in the the overall vector, so that we can test it’s p value NullDiff.vec = c(NullDiff.Obs) nSim = 1000 nSim.vec = 1:nSim
for (i in nSim.vec) { rData$result = sample(mData$result)
temp1 = rData$strategy == "stay" total.strat = sum(temp1) temp1 = rData[temp1,] temp1 = (temp1$result == "win")
(pStay.ran = sum(temp1)/total.strat)
temp2 = rData$strategy == "switch" total.strat = sum(temp2) temp2 = rData[temp2,] temp2 = (temp2$result == "win")
pSwitch.ran = sum(temp2)/total.strat
NullDiff = pSwitch.ran - pStay.ran
NullDiff.vec = c(NullDiff.vec, NullDiff) }
NullDiff.vec
What is the p-value of this test and what does it signify? *** HINT Remember the definition for the p-value
p_bool = NullDiff.vec >= NullDiff.Obs (p_num = sum(p_bool))
(p_value = p_num/nSim)
#Since P-Value > Critical Value, it means we accept the Null Hypothesis
plot the histogram of null values with a red line indicating the location of the observed value ggplot() + geom_histogram(aes(x = NullDiff.vec), bins = 10, fill = "blue", alpha = 0.5) + geom_vline(xintercept = 0.2, color = "red", linetype = "dashed") + labs(title = "Distribution of Null Differences", x = "Null Difference", y = "Count")
0
100
200
300
400
−1.0 −0.5 0.0 0.5 1.
Distribution of Null Differences
Make a function called fn.MontyTest The function should take two variables as input: (a) a data.frame in the form of mData: Call the input data obsData (b) Number of simulations : nSim. Set the default to 1000 It should return a list withe the following: (a) Observed value of test statistic (i.e., difference in probability of win in switch vs. stay) (b) Critical value of test statistic (i.e., the value that will be in the 5% upper tail of the null distribution) (c) P value of the test
HodgePodge = NullDiff.Obs fn.MontyTest = function (obsData, nSim) {
mData = obsData
#Part 1 temp1 = mData$strategy == "stay" temp1 = mData[temp1,] temp1 = (temp1$result == "win")
pStay.Obs = sum(temp1)/
temp2 = mData$strategy == "switch" temp2 = mData[temp2,] temp2 = (temp2$result == "win")
pSwitch.Obs = sum(temp2)/ NullDiff.Obs = pSwitch.Obs - pStay.Obs
cat("The test statistic of the observed data is: ", NullDiff.Obs, "\n")
#Part 2 rData = mData
NullDiff.vec <<- c(HodgePodge) nSim = 1000 nSim.vec = 1:nSim
for (i in nSim.vec) { rData$result = sample(mData$result)
temp1 = rData$strategy == "stay" total.strat = sum(temp1) temp1 = rData[temp1,] temp1 = (temp1$result == "win")
pStay.ran = sum(temp1)/total.strat
temp2 = rData$strategy == "switch" total.strat = sum(temp2) temp2 = rData[temp2,] temp2 = (temp2$result == "win")
pSwitch.ran = sum(temp2)/total.strat
NullDiff = pSwitch.ran - pStay.ran
NullDiff.vec <<- c(NullDiff.vec, NullDiff) }
critical_index = (1 - 0.05) * nSim NullDiff_sorted = sort(NullDiff.vec)
(exp_Stay = rbinom(5, 1, pStay))
exp_Switch <- ifelse(exp_Switch == 1, "win", "lose") exp_Stay <- ifelse(exp_Stay == 1, "win", "lose")
expData <- mData (expData$result = c(exp_Stay, exp_Switch))
Test if the expected data generated under the alternative hypothesis gives a significant result temp1 = expData$strategy == "stay" temp1 = expData[temp1,] temp1 = (temp1$result == "win")
pStay.exp = sum(temp1)/
temp2 = expData$strategy == "switch" temp2 = expData[temp2,] temp2 = (temp2$result == "win")
pSwitch.exp = sum(temp2)/
NullDiff.exp = pSwitch.exp - pStay.exp
bool_crit = critical_value <= NullDiff.exp
cat("It is", bool_crit, "that it is significant!\n")
The above is a single test of the data generated under the alternative hypothesis given the observed number os samples and the observed probability of wins in the switch and stay strategies
What we had wanted to find was what is the probability of finding a statistically significant result under the above conditions How will you do this? HodgePodge = NullDiff.exp
fn.MontyTest(expData, 1000)
sum(NullDiff.vec <= critical_value)/nSim
What is the probability of obtaining a significant result given the sample size and the probabilities of success in the two strategies?
What is does this value represent? it represents the probability of obtaining a significant result given the sample size and the probabilities of success in the two strategies HodgePodge = NullDiff.exp
fn.MontyTest(expData, 1000)
sum(NullDiff.vec <= critical_value)/nSim
Convert the above code to a function. Call the function :: fn.MontyPower The function should take the following parameters: nSwitch ## the sample size for the number of times the switch strategy is played nStay ## the sample size for the number of times the stay strategy is played pSwitch ## the probability of winning with the switch strategy :: this is 2/3 pStay ## the probability of winning with the stay strategy :: this is 1/3 nSim = 1000 ## number of power simulations It should return the Power (i.e., the proportion of significant tests at a Type I error rate of 0.05 given the parameter values of nSwitch, nStay, pSwitch and pStay) pSwitch = 2/ pStay = 1/ nSwitch= 10 nStay= 10
fn.MontyTest = function (expData, nSim, nStay, nSwitch, pSwitch, pStay) {
(exp_Switch = rbinom(nSwitch, 1, pSwitch)) (exp_Stay = rbinom(nStay, 1, pStay))
exp_Switch <- ifelse(exp_Switch == 1, "win", "lose") exp_Stay <- ifelse(exp_Stay == 1, "win", "lose")
expData <- data.frame( X = 1:(nSwitch + nStay), strategy = c(rep("stay", nStay), rep("switch", nSwitch)), result = c(exp_Stay, exp_Switch))}
HodgePodge = NullDiff.exp
fn.MontyTest(expData, nSim, nStay, nSwitch, pSwitch, pStay)
sum(NullDiff.vec <= critical_value)/nSim
Try your power calculation assuming that the number of times the person plays the game is 30 for each strategy fn.MontyTest(expData, 30, nStay, nSwitch, pSwitch, pStay) sum(NullDiff.vec <= critical_value)/
What doe the above result imply? the power is 33%