Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

R code and exercises for analyzing data, performing hypothesis tests, and calculating confidence intervals. Topics include markdown syntax, loading packages, data manipulation, and statistical analysis using r. Exercises on the 'penguin.data' and 'fertility.data' datasets.

Typology: Exercises

2017/2018

1 / 6

Download Data Analysis with R: Hypothesis Testing and Confidence Intervals and more Exercises Statistics in PDF only on Docsity! .main-container { max-width: 940px; margin-left: auto; margin-right: auto; } code { color: inherit; background-color: rgba(0, 0, 0, 0.04); } img { max-width:100%; height: auto; } HW #7 Sam Harris October 27, 2015 This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http:// rmarkdown.rstudio.com. When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this: summary(cars) ## speed dist ## Min. : 4.0 Min. : 2.00 ## 1st Qu.:12.0 1st Qu.: 26.00 ## Median :15.0 Median : 36.00 ## Mean :15.4 Mean : 42.98 ## 3rd Qu.:19.0 3rd Qu.: 56.00 ## Max. :25.0 Max. :120.00 You can also embed plots, for example: Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot. require(mosaic) ## Loading required package: mosaic ## Loading required package: dplyr ## ## Attaching package: 'dplyr' ## ## The following objects are masked from 'package:stats': ## ## filter, lag ## ## The following objects are masked from 'package:base': ## ## intersect, setdiff, setequal, union ## ## Loading required package: lattice ## Loading required package: ggplot2 ## Loading required package: car ## Loading required package: mosaicData ## ## Attaching package: 'mosaic' ## ## The following object is masked from 'package:car': ## ## logit ## ## The following objects are masked from 'package:dplyr': ## ## count, do, tally ## ## The following objects are masked from 'package:stats': ## ## binom.test, cor, cov, D, fivenum, IQR, median, prop.test, ## quantile, sd, t.test, var ## ## The following objects are masked from 'package:base': ## ## max, mean, min, prod, range, sample, sum require(Lock5Data) ## Loading required package: Lock5Data set.seed(100210) penguin.data<-data.frame(group=rep(c("Control", "Experimental"), times=c(50,50)), survived=rep(c("Yes", "No", "Yes", "No"), times= c(31,19, 16, 34))) head(penguin.data) ## group survived ## 1 Control Yes ## 2 Control Yes ## 3 Control Yes ## 4 Control Yes ## 5 Control Yes ## 6 Control Yes prop((survived=="Yes")~group, data = penguin.data) ## TRUE.Control TRUE.Experimental ## 0.62 0.32 observed.difference<-diff(prop((survived=="Yes")~group, data = penguin.data)) prop((survived=="Yes") ~ shuffle(group), data=penguin.data) ## TRUE.Control TRUE.Experimental ## 0.46 0.48 -diff(prop((survived=="Yes") ~ shuffle(group), data=penguin.data)) ## TRUE.Experimental ## -0.14 random.differences<-do(10000)*-diff(prop((survived=="Yes") ~ shuffle(group), data=penguin.data)) names(random.differences)<-c("phat.control.minus.phat.metal") P.value<-prop((~phat.control.minus.phat.metal >= observed.difference), data = random.differences) P.value ## TRUE ## 0.9992 histogram(~phat.control.minus.phat.metal, data = random.differences,groups = (phat.control.minus.phat.metal >= observed.difference),width = 1/50, cex = 10) Exercise 1 set.seed(109210) fertility.data<-data.frame(group=rep(c("Fertile", important and probably already stored somewhere. d) A confidence interval. Once we take a sample, it will be efficient to bootstrap and create a CI, and we can fairly certain whether the population parameter falls within the CI. 1. 1. The p-value means that there is a 2% chance that when the null hypothesis is true, the results of another sample will come back the same or more extreme. These results are significant which means that the increase in tax probably caused a decrease in soda consumption. 2. This p-value means that there is a 41% chance that when the null hypothesis is true, the results of another sample will come back the same or more extreme. This is not a significant result, and the null hypothesis is probably true. Taxes probably don’t effect soda consumption. 3. 0.02 1. The expected center of a bootstrap distribution is the sample mean. The expected center of a randomization distribution is where the null hypothesis lies. 2. 1. The commuters could have been randomly resampled and then replaced for each drawing. 2. data("BootAtlantaCorr") head(BootAtlantaCorr) ## CorrTimeDist ## 1 0.828 ## 2 0.805 ## 3 0.820 ## 4 0.815 ## 5 0.826 ## 6 0.827 lower<-quantile(~CorrTimeDist,data = BootAtlantaCorr, prob=0.005) upper<-quantile(~CorrTimeDist, data = BootAtlantaCorr, prob=0.995) CI<-c(lower,upper) CI ## 0.5% 99.5% ## 0.70600 0.87501 The 99% confidence interval for the correlation in this setting is 0.706 to 0.875. This means that we are 99% confident that the correlation of distance and time of commutes for the population will fall within this interval. c) lower.95<-quantile(~CorrTimeDist,data = BootAtlantaCorr, prob=0.025) upper.95<-quantile(~CorrTimeDist, data = BootAtlantaCorr, prob=0.975) CI.95<-c(lower.95, upper.95) CI.95 ## 2.5% 97.5% ## 0.729 0.867 lower.90<-quantile(~CorrTimeDist,data = BootAtlantaCorr, prob=0.05) upper.90<-quantile(~CorrTimeDist, data = BootAtlantaCorr, prob=0.95) CI.90<-c(lower.90,upper.90) CI.90 ## 5% 95% ## 0.742 0.859 The 95% confidence interval is 0.729 to 0.867. The 90% confidence interval is 0.742 to 0.859. d) As the interval becomes less confident, the space in the interval gets smaller. 1. 1. The null hypothesis is correct. 2. The data will center around the null hypothesis rho=0. 3. I would write the lake’s pH and mercury levels on each card, so there are 53 cards that reflect the original sample. Then I would resample and replace each card after drawing and make a new sample. This new sample would have a correlation, and I would add it to the observed correlation (-0.575), and repeat the process thounsands of time to get a randomized distribution centered at the null hypothesis of 0. 1. 1. Relavant parameters are the patients’ heart rate and systolic BP. The null hypothesis is that there is no correlation between Heart Rate and Systolic BP so 0. The alternative is that there is a correlation, and so correlation doesn’t =0. 2. We assume there the null hypothesis is correct 3. require(mosaic) heartrate.systolicbp.table<-read.file("~/Desktop/ heartrate.systolicbp.csv") ## Reading data with readr::read_csv() data="heartrate.systolicbp" head(heartrate.systolicbp.table) ## Heartrate Systolicbp ## 1 86 110 ## 2 86 188 ## 3 92 128 ## 4 100 122 ## 5 112 132 ## 6 116 140 observed.correlation<-cor(Heartrate~Systolicbp, data = heartrate.systolicbp.table) We will record the correlation between heart rate and systolic BP. The observed statistic correlation is 0.279 d) The randomization distribution will be centered at the null hypothesis, 0. e) We can resample the data, and find the correlation, and then add this correlation that we find to the observed statistic, so that the sum will be 0. f) Using Statkey, my first sample in the randomization distribution had a correlation of -0.573 and the variables were (Heart Rate, Systolic Bp): (86,188) (86,110) (92,128) (100,122) (112,132) (116,140) (136,190) (140,138) g) Correlation of a second sample: -0.241 // add bootstrap table styles to pandoc tables $(document).ready(function () { $('tr.header').parent('thead').parent('table').addClass('table table-condensed'); }); (function () { var script = document.createElement("script"); script.type = "text/javascript"; script.src = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS- MML_HTMLorMML"; document.getElementsByTagName("head")[0].appendChild (script); })();