


Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A review for the second mid-term exam in a statistics course (bios 543) focusing on comparing means of continuous variables and looking for relationships between continuous variables. It covers descriptive statistics, continuous and categorical responses, hypothesis testing for means, and correlation analysis.
Typology: Study notes
1 / 4
This page cannot be seen from the preview
Don't miss anything!



The second mid-term will cover the material on comparing the means of continuous variables and on looking for relationships between continuous variables (Sections 5–10).
Which statistical method should I use? As we discussed in the first review (which covered how to answer this question if the response is categorical) begin by considering the (random) response variable and its measurement level (the Y and its modeling type).
Descriptive Statistics
No matter what hypotheses we test, we still have to summarize the data first. If the variable to be summarized is continuous, first consider the shape of the distribution. Beyond visual inspection of the histogram, the primary tool for deciding whether normality is warranted is the normal quantile plot. Don’t be too quick to reject normality; we have a strong preference for normality. Most of the statistics we use are robust and can handle moderate departures from normality. In large samples we need a very strong argument to ignore this preference. In smaller samples, unless there is a strong case against it, normality may be justifiable. Recall the CLT. Use the recommendations on Section 5, pages 17-18 to guide you. If normality is justifiable, then report the n, mean, SD, and perhaps 95% CI’s. If the data are clearly not normal, then report the n, median, and either the IQR or range. Note also that even if the main focus is on the continuous response variable, there may also be categorical variables that you need to describe. For instance, if you are comparing two groups (e.g. males and females), then describe the groups.
Continuous Y
Categorical responses were covered earlier. That is, qualitative response variables with nominal measurement level. Recall the first question:
(^1) Continuous variables are variables where taking an average makes some sense. They must,
therefore, be numeric. In addition, the rank ordering of values may also be considered here (using nonparametric methods).
If the answer is “yes” then go back to the first review. Next, the most difficult question.
What’s the question?
Next we consider the substantive question. Since the response is continuous, we’re probably interested in means but what about them? Questions either refer to comparisons or relationships. Looking to make a comparison? If you are interested in comparing your observed mean to something, then you may be interested in differences between your data and an external reference, or you may be interested in differences between the groups within your study.
in Section 12 (which we have NOT covered yet).
How to use and interpret:
Grouped-values histogram Mean and median Range, variance, standard deviation The empirical rule Box plot Percentiles, quartiles Normal quantile plot Standard error of the mean (how is this different than the SD?) Confidence interval on the mean Correlation, slope of the straight line fit R-square, “variance accounted for”
Questions
What is the expected mean of the sample mean? What is the relationship between standard deviation of the data and the standard error of the sample mean? When is the sample mean normally distributed? How does the central limit theorem help? What problem does it solve? When does it make statistics easier? When would reporting a mean be not useful (not defendable)? When would reporting a standard deviation be not useful (not defendable)? When would a calculated CI on a mean be not useful (not defendable)? Under what circumstances is it OK to change the value of an outlier? Under what circumstances is it OK to remove unusual values from your dataset? In regression, when do we use the Prediction Interval and when do we use the Confidence Interval?