













































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This document by Janette Walde covers advanced statistics concepts, including scales of measurement (nominal, interval, and ratio), statistical tests and scientific hypotheses, types of errors (Type I and II), acceptable levels of errors, statistical power, ways to increase power, and examples of computing power and power in a t-test. The document also discusses the importance of statistical power and the prevalence of lack of power in scientific studies.
Typology: Summaries
1 / 85
This page cannot be seen from the preview
Don't miss anything!














































































Janette Walde [email protected] Department of Statistics University of Innsbruck
“We are pattern-seeking story-telling animals.” (Edward Leamer)
Statistics does not hand truth to the user on a silver platter. However, statistics confines arbitrariness and provides comprehensible conclusions.
“Es gibt keine Tatsachen, es gibt nur Interpretationen.” (Friedrich Nietzsche)
(^3) Although knowing the most sophisticated analyzing instruments one may be confronted with limits in getting results or finding appropriate interpretations or applying tools in the given framework. This has to be accepted (“If we torture the data long enough, they will confess.”). (^4) Be aware: Never confuse statistical significance with biological significance.
(^1) Nominal Scale. Nominal data are attributes like sex or species, and represent measurement at its weakest level. We can determine if one object is different from another, and the only formal property of nominal scale data is equivalence. (^2) Ranking Scale. Some biological variables cannot be measured on a numerical scale, but individuals can be ranked in relation to one another. Two formal properties occur in ranking data: equivalence and greater than.
A statistical test is a confrontation of the real world (observations) to a theory (model) with the aim of falsifying the model.
As such the statistical test (as a scientific method) fits directly into the philosophy of science described by the English philosopher Karl Popper (1902–1994) (see e.g. The Logic of Scientific Discovery, 1972). Basically the philosophy says that 1) theories can not be empirically verified but only falsified and 2) scientific progress happens by having a theory until it is falsified. That is, if we observe a phenomenon (data) which under the model (theory) is very unlikely, then we reject the model (theory).
Suppose we have a coin, and that our hypothesis is that the coin is fair, i.e. that P(head) = P(tail) = 1 /2. Suppose we toss a coin n = 25 times and observe 21 heads. The probability of actually observing these data under the model is P(21 heads, 4 tails) = 0.0004. It is a very unlikely (but possible) event to see such data if the model is true. In this falsification process we employ the interpretation principle of statistics:
Unlikely events do not occur...
If we do not employ this principle we can never say anything at all on the basis of statistics (observations): An opponent can always claim that the present observations just are “an unfortunate outcome” which - no matter how unlikely they are - are possible. In practice the statistical interpretation principle needs more structure: In a large sample space, all possible outcomes will have a very small probability, so it will be unlikely to have the data one has. In addition there is also the question about how small a probability is needed in order to classify data as being unlikely.
Type I error (α) Typically α = 0.05 (This convention is due to R.A. Fisher) More stringent test α = 0.01 or α = 0. 001 Exploratory or preliminary experiments α = 0. 10
Type II error (β) Typically 0. Often unspecified and much less than 0.
The power of a significance test measures its ability to detect an alternative hypothesis. The power against a specific alternative is calculated as the probability that the test will reject H 0 when that specific alternative is true. Statistical power is the probability of rejecting H 0 given population effect size (ES), α and sample size (n). This calculation also requires knowledge of the sampling distribution of the test statistic under the alternative hypothesis.
Does exercise make strong bones?
Can a 6-month exercise program increase the total body bone mineral content (TBBMC) of young women? A team of researchers is planning a study to examine this question. Based on the results of a previous study, they are willing to assume that σ = 2 for the percent change in TBBMC over the 6-month period. A change in TBBMC of 1% would be considered important, and the researcher would like to have a reasonable chance of detecting a change this large or larger. Are 25 subjects a large enough sample for this project?
(^1) State the hypotheses: let μ denote the mean percent change:
H 0 : μ = 0 Ha : μ > 0
(^2) Calculate the rejection region: The z test rejects H 0 at the α = 0.05 level whenever:
z =
¯x − μ 0 σ/
n
x¯ 2 /
That is we reject H 0 when ¯x ≥ 0 .658.
Increase α. A 5% test of significance will have a greater chance of rejecting the alternative than a 1% test because the strength of evidence required for rejection is less. Consider a particular alternative that is farther away from μ 0. Values of μ that are in Ha but lie close to the hypothesized value μ 0 are harder to detect than values of μ that are far from μ 0. Increase the sample size. More data will provide more information about ¯x so we have a better chance of distinguishing values of μ.
Decrease σ. This has the same effect as increasing the sample size: it provides more information about μ. Improving the measurement process and restricting attention to a subpopulation are two common ways to decrease σ.