Docsity
Docsity

Prepare-se para as provas
Prepare-se para as provas

Estude fácil! Tem muito documento disponível na Docsity


Ganhe pontos para baixar
Ganhe pontos para baixar

Ganhe pontos ajudando outros esrudantes ou compre um plano Premium


Guias e Dicas
Guias e Dicas


Sample size estimation, Trabalhos de Estatística Aplicada

trabalho estatística, amostragem estatística

Tipologia: Trabalhos

2021

Compartilhado em 22/05/2021

noriega-10
noriega-10 🇧🇷

1 documento

1 / 5

Toggle sidebar

Esta página não é visível na pré-visualização

Não perca as partes importantes!

bg1
John Eng, MD
Index terms:
Radiology and radiologists, research
Statistical analysis
Published online
10.1148/radiol.2272012051
Radiology 2003; 227:309–313
1
From the Russell H. Morgan Depart-
ment of Radiology and Radiological
Science, Johns Hopkins University, 600
N Wolfe St, Central Radiology Viewing
Area, Rm 117, Baltimore, MD 21287.
Received December 17, 2001; revision
requested January 29, 2002; revision
received March 7; accepted March 13.
Address correspondence to the au-
thor (e-mail: [email protected]).
©
RSNA, 2003
Sample Size Estimation: How
Many Individuals Should Be
Studied?
1
The number of individuals to include in a research study, the sample size of the
study, is an important consideration in the design of many clinical studies. This
article reviews the basic factors that determine an appropriate sample size and
provides methods for its calculation in some simple, yet common, cases. Sample size
is closely tied to statistical power, which is the ability of a study to enable detection
of a statistically significant difference when there truly is one. A trade-off exists
between a feasible sample size and adequate statistical power. Strategies for reduc-
ing the necessary sample size while maintaining a reasonable power will also be
discussed.
©
RSNA, 2003
How many individuals will I need to study? This question is commonly asked by the
clinical investigator and exposes one of many issues that are best settled before actually
carrying out a study. Consultation with a statistician is worthwhile in addressing many
issues of study design, but a statistician is not always readily available. Fortunately, many
studies in radiology have simple designs for which determination of an appropriate sample
size—the number of individuals that should be included for study—is relatively straight-
forward.
Superficial discussions of sample size determination are included in typical introductory
biostatistics texts (1–3). The goal of this article is to augment these introductory discus-
sions with additional practical material. First, the need for considering sample size will be
reviewed. Second, the study design parameters affecting sample size will be identified.
Third, formulae for calculating appropriate sample sizes for some common study designs
will be defined. Finally, some advice will be offered on what to do if the calculated sample
size is impracticably large. To assist the reader in performing the calculations described in
this article and to encourage experimentation with them, a World Wide Web page has
been developed that closely parallels the equations presented in this article. This page can
be found at www.rad.jhmi.edu/jeng/javarad/samplesize/.
Even if a statistician is readily available, the investigator may find that a working
knowledge of the factors affecting sample size will result in more fruitful communication
with the statistician and in better research design. A working knowledge of these factors is
also required to use one of the numerous Web pages (46) and computer programs (7–9)
that have been developed for calculating appropriate sample sizes. It should be noted that
Web pages for calculating sample size are typically limited for use in situations involving
the well-known parametric statistics, which are those involving the calculation of summary
means, proportions, or other parameters of an assumed underlying statistical distribution
such as the normal, Student t, or binomial distributions. The calculation of sample size for
nonparametric statistics such as the Wilcoxon rank sum test is performed by some
computer programs (7,9).
IMPORTANCE OF SAMPLE SIZE
In a comparative research study, the means or proportions of some characteristic in two or
more comparison groups are measured. A statistical test is then applied to determine
whether or not there is a significant difference between the means or proportions observed
in the comparison groups. We will first consider the comparative type of study.
Statistical Concepts Series
309
R
adiology
pf3
pf4
pf5

Pré-visualização parcial do texto

Baixe Sample size estimation e outras Trabalhos em PDF para Estatística Aplicada, somente na Docsity!

John Eng, MD

Index terms: Radiology and radiologists, research Statistical analysis

Published online 10.1148/radiol. Radiology 2003; 227:309 –

(^1) From the Russell H. Morgan Depart- ment of Radiology and Radiological Science, Johns Hopkins University, 600 N Wolfe St, Central Radiology Viewing Area, Rm 117, Baltimore, MD 21287. Received December 17, 2001; revision requested January 29, 2002; revision received March 7; accepted March 13. Address correspondence to the au- thor (e-mail: [email protected] ). © (^) RSNA, 2003

Sample Size Estimation: How

Many Individuals Should Be

Studied?

The number of individuals to include in a research study, the sample size of the study, is an important consideration in the design of many clinical studies. This article reviews the basic factors that determine an appropriate sample size and provides methods for its calculation in some simple, yet common, cases. Sample size is closely tied to statistical power, which is the ability of a study to enable detection of a statistically significant difference when there truly is one. A trade-off exists between a feasible sample size and adequate statistical power. Strategies for reduc- ing the necessary sample size while maintaining a reasonable power will also be discussed. © (^) RSNA, 2003

How many individuals will I need to study? This question is commonly asked by the clinical investigator and exposes one of many issues that are best settled before actually carrying out a study. Consultation with a statistician is worthwhile in addressing many issues of study design, but a statistician is not always readily available. Fortunately, many studies in radiology have simple designs for which determination of an appropriate sample size —the number of individuals that should be included for study—is relatively straight- forward. Superficial discussions of sample size determination are included in typical introductory biostatistics texts (1–3). The goal of this article is to augment these introductory discus- sions with additional practical material. First, the need for considering sample size will be reviewed. Second, the study design parameters affecting sample size will be identified. Third, formulae for calculating appropriate sample sizes for some common study designs will be defined. Finally, some advice will be offered on what to do if the calculated sample size is impracticably large. To assist the reader in performing the calculations described in this article and to encourage experimentation with them, a World Wide Web page has been developed that closely parallels the equations presented in this article. This page can be found at www.rad.jhmi.edu/jeng/javarad/samplesize/. Even if a statistician is readily available, the investigator may find that a working knowledge of the factors affecting sample size will result in more fruitful communication with the statistician and in better research design. A working knowledge of these factors is also required to use one of the numerous Web pages (4 – 6) and computer programs (7–9) that have been developed for calculating appropriate sample sizes. It should be noted that Web pages for calculating sample size are typically limited for use in situations involving the well-known parametric statistics , which are those involving the calculation of summary means, proportions, or other parameters of an assumed underlying statistical distribution such as the normal, Student t , or binomial distributions. The calculation of sample size for nonparametric statistics such as the Wilcoxon rank sum test is performed by some computer programs (7,9).

IMPORTANCE OF SAMPLE SIZE

In a comparative research study, the means or proportions of some characteristic in two or more comparison groups are measured. A statistical test is then applied to determine whether or not there is a significant difference between the means or proportions observed in the comparison groups. We will first consider the comparative type of study.

Statistical Concepts Series

309

R

adiology

Sample size is important primarily be- cause of its effect on statistical power. Sta- tistical power is the probability that a statistical test will indicate a significant difference when there truly is one. Statis- tical power is analogous to the sensitivity of a diagnostic test (10), and one could mentally substitute the word “sensitivi- ty” for the word “power” during statistical discussions. In a study comparing two groups of individuals, the power (sensitivity) of a statistical test must be sufficient to enable detection of a statistically significant dif- ference between the two groups if a dif- ference is truly present. This issue be- comes important if the study results were to demonstrate no statistically significant difference. If such a negative result were to occur, there would be two possible interpretations. The first interpretation is that the results of the statistical test are correct and that there truly is no statisti- cally significant difference (a true-nega- tive result). The second interpretation is that the results of the statistical test are erroneous and that there is actually an underlying difference, but the study was not powerful enough (sensitive enough) to find the difference, yielding a false- negative result. In statistical terminol- ogy, a false-negative result is known as a type II error. An adequate sample size gives a statistical test enough power (sensitiv- ity) so that the first interpretation (that the results are true-negative) is much more plausible than the second interpre- tation (that a type II error occurred) in the event no statistically significant dif- ference is found in the study. It is well known that many published clinical research studies possess low sta- tistical power owing to inadequate sam- ple size or other design issues (11,12). One could argue that it is as wasteful and inappropriate to conduct a study with inadequate power as it is to obtain a di- agnostic test of insufficient sensitivity to rule out a disease.

PARAMETERS THAT

DETERMINE APPROPRIATE

SAMPLE SIZE

An appropriate sample size generally de- pends on five study design parameters: minimum expected difference (also known as the effect size), estimated measurement variability, desired statistical power, sig- nificance criterion, and whether a one- or two-tailed statistical analysis is planned.

Minimum Expected Difference This parameter is the smallest measured difference between comparison groups that the investigator would like the study to detect. As the minimum expected differ- ence is made smaller, the sample size needed to detect statistical significance increases. The setting of this parameter is subjective and is based on clinical judg- ment and experience with the problem being investigated. For example, suppose a study is designed to compare a standard diagnostic procedure of 80% accuracy with a new procedure of unknown but poten- tially higher accuracy. It would probably be clinically unimportant if the new pro- cedure were only 81% accurate, but sup- pose the investigator believes that it would be a clinically important improve- ment if the new procedure were 90% ac- curate. Therefore, the investigator would choose a minimum expected difference of 10% (0.10). The results of pilot studies or a literature review can also guide the selection of a reasonable minimum dif- ference.

Estimated Measurement Variability This parameter is represented by the expected SD in the measurements made within each comparison group. As statis- tical variability increases, the sample size needed to detect the minimum difference increases. Ideally, the estimated measure- ment variability should be determined on the basis of preliminary data collected from a similar study population. A review of the literature can also provide esti- mates of this parameter. If preliminary data are not available, this parameter may have to be estimated on the basis of sub- jective experience, or a range of values may be assumed. A separate estimate of measurement variability is not required when the measurement being compared is a proportion (in contrast to a mean), because the SD is mathematically derived from the proportion.

Statistical Power This parameter is the power that is de- sired from the study. As power is increased, sample size increases. While high power is always desirable, there is an obvious trade-off with the number of individuals that can feasibly be studied, given the usually fixed amount of time and re- sources available to conduct a study. In randomized controlled trials, the statisti- cal power is customarily set to a number greater than or equal to 0.80, with many

clinical trial experts now advocating a power of 0.90.

Significance Criterion This parameter is the maximum P value for which a difference is to be con- sidered statistically significant. As the sig- nificance criterion is decreased (made more strict), the sample size needed to detect the minimum difference increases. The significance criterion is customarily set to .05.

One- or Two-tailed Statistical Analysis In a few cases, it may be known before the study that any difference between comparison groups is possible in only one direction. In such cases, use of a one- tailed statistical analysis, which would re- quire a smaller sample size for detection of the minimum difference than would a two-tailed analysis, may be considered. The sample size of a one-tailed design with a given significance criterion—for example, —is equal to the sample size of a two-tailed design with a significance criterion of 2, all other parameters being equal. Because of this simple relationship and because truly appropriate one-tailed analyses are rare, a two-tailed analysis is assumed in the remainder of this article.

SAMPLE SIZES FOR

COMPARATIVE RESEARCH

STUDIES

With knowledge of the design parame- ters detailed in the previous section, the calculation of an appropriate sample size simply involves selecting an appropriate equation. For a study comparing two means, the equation for sample size (13) is

N 

4 ^2  z crit  z pwr^2 D^2 ,^ (1)

where N is the total sample size (the sum of the sizes of both comparison groups),  is the assumed SD of each group (as- sumed to be equal for both groups), the z crit value is that given in Table 1 for the desired significance criterion, the z pwr value is that given in Table 2 for the de- sired statistical power, and D is the min- imum expected difference between the two means. Both z crit and z pwr are cutoff points along the x axis of a standard nor- mal probability distribution that demar- cate probabilities matching the specified significance criterion and statistical power, respectively. The two groups that make up

310  Radiology  May 2003 Eng

R

adiology

tion (3). Like Equation (2), Equation (4) depends not only on the width of the expected CI but also on the magnitude of the proportion itself. Also like Equation (2), Equation (4) does not require an in- dependent estimate of SD because it is calculated from p within the equation. As an example, suppose an investigator would like to determine the accuracy of a diagnostic test with a 95% CI of 10%. Suppose that, on the basis of results of preliminary studies, the estimated accu- racy is 80%. With these assumptions, D  0.20, p  0.80, and z crit  1.960. Equation (4) yields a sample size of N 

  1. Therefore, 61 patients should be ex- amined in the study.

MINIMIZING THE SAMPLE SIZE

Now that we understand how to calcu- late sample size, what if the sample size we calculate is too large to be feasibly studied? Browner et al (16) list a number of strategies for minimizing the sample size. These strategies are briefly discussed in the following paragraphs.

Use Continuous Measurements Instead of Categories Because a radiologic diagnosis is often expressed in terms of a binary result, such as the presence or absence of a disease, it is natural to convert continuous mea- surements into categories. For example, the size of a lesion might be encoded as “small” or “large.” For a sample of fixed size, the use of the actual measurement rather than the proportion in each cate- gory yields more power. This is because statistical tests that incorporate the use of continuous values are mathematically more powerful than those used for pro- portions, given the same sample size.

Use More Precise Measurements For studies in which Equation (1) or Equation (2) applies, any way to increase the precision (decrease the variability) of the measurement process should be sought. For some types of research, precision can be increased by simply repeating the measurement. More complex equations are necessary for studies involving re- peated measurements in the same indi- viduals (17), but the basic principles are similar.

Use Paired Measurements Statistical tests like the paired t test are mathematically more powerful for a given sample size than are unpaired tests

because in paired tests, each measure- ment is matched with its own control. For example, instead of comparing the average lesion size in a group of treated patients with that in a control group, measuring the change in lesion size in each patient after treatment allows each patient to serve as his or her own control and yields more statistical power. Equa- tion (1) can still be used in this case. D represents the expected change in the measurement, and  is the expected SD of this change. The additional power and reduction in sample size are due to the SD being smaller for changes within individ- uals than for overall differences between groups of individuals.

Use Unequal Group Sizes Equations (1) and (2) involve the as- sumption that the comparison groups are equal in size. Although it is statistically most efficient if the two groups are equal in size, benefit is still gained by studying more individuals, even if the additional individuals all belong to one of the groups. For example, it may be feasible to recruit additional individuals into the control group even if it is difficult to recruit more individuals into the noncontrol group. More complex equations are necessary for calculating sample sizes when com- paring means (13) and proportions (18) of unequal group sizes.

Expand the Minimum Expected Difference Perhaps the minimum expected differ- ence that has been specified is unneces- sarily small, and a larger expected differ- ence could be justified, especially if the planned study is a preliminary one. The results of a preliminary study could be used to justify a more ambitious follow-up study of a larger number of individuals and a smaller minimum difference.

DISCUSSION

The formulation of Equations (1–4) in- volves two statistical assumptions which should be kept in mind when these equa- tions are applied to a particular study. First, it is assumed that the selection of individ- uals is random and unbiased. The decision to include an individual in the study can- not depend on whether or not that indi- vidual has the characteristic or outcome being studied. Second, in studies in which a mean is calculated from measurements of individuals, the measurements are as- sumed to be normally distributed. Both of

these assumptions are required not only by the sample size calculation method, but also by the statistical tests themselves (such as the t test). The situations in which Equa- tions (1–4) are appropriate all involve para- metric statistics. Different methods for de- termining sample size are required for nonparametric statistics such as the Wil- coxon rank sum test. Equations for calculating sample size, such as Equations (1) and (2), also pro- vide a method for determining statistical power corresponding to a given sample size. To calculate power, solve for z pwr in the equation corresponding to the design of the study. The power can be then de- termined by referring to Table 2. In this way, an “observed power” can be calcu- lated after a study has been completed, where the observed difference is used in place of the minimum expected differ- ence. This calculation is known as retro- spective power analysis and is sometimes used to aid in the interpretation of the statistical results of a study. However, ret- rospective power analysis is controversial because it can be shown that observed power is completely determined by the P value and therefore cannot add any ad- ditional information to its interpretation (19). Power calculations are most appro- priate when they incorporate a minimum difference that is stated prospectively. The accuracy of sample size calcula- tions obviously depends on the accuracy of the estimates of the parameters used in the calculations. Therefore, these calcula- tions should always be considered esti- mates of an absolute minimum. It is usu- ally prudent for the investigator to plan to include more than the minimum number of individuals in a study to com- pensate for loss during follow-up or other causes of attrition. Sample size is best considered early in the planning of a study, when modifica- tions in study design can still be made. Attention to sample size will hopefully result in a more meaningful study whose results will eventually receive a high pri- ority for publication.

References

  1. Pagano M, Gauvreau K. Principles of bio- statistics. 2nd ed. Pacific Grove, Calif: Duxbury, 2000; 246 – 249, 330 – 331.
  2. Daniel WW. Biostatistics: a foundation for analysis in the health sciences. 7th ed. New York, NY: Wiley, 1999; 180–185, 268 –
  3. Altman DG. Practical statistics for medi- cal research. London, England: Chapman & Hall, 1991.
  4. Bond J. Power calculator. Available at: http://calculators.stat.ucla.edu/powercalc/. Accessed March 11, 2003.

312  Radiology  May 2003 Eng

R

adiology

  1. Uitenbroek DG. Sample size: SISA—sim- ple interactive statistical analysis. Avail- able at: http://home.clara.net/sisa/samsize .htm. Accessed March 3, 2003.
  2. Lenth R. Java applets for power and sam- ple size. Available at: www.stat.uiowa.edu / rlenth/Power/index.html. Accessed March 3, 2003.
  3. NCSS Statistical Software. PASS 2002. Available at: www.ncss.com/pass.html. Ac- cessed March 3, 2003.
  4. SPSS. SamplePower. Available at: www.spss .com/SPSSBI/SamplePower/. Accessed March 3, 2003.
  5. Statistical Solutions. nQuery Advisor. Available at: www.statsolusa.com/nquery /nquery.htm. Accessed March 3, 2003.
  6. Browner WS, Newman TB. Are all signif- icant P values created equal? The analogy

between diagnostic tests and clinical re- search. JAMA 1987; 257:2459 – 2463.

  1. Moher D, Dulberg CS, Wells GA. Statisti- cal power, sample size, and their report- ing in randomized controlled trials. JAMA 1994; 272:122–124.
  2. Freiman JA, Chalmers TC, Smith H, Kue- bler RR. The importance of beta, the type II error and sample size in the design and interpretation of the randomized control trial: survey of 71 “negative” trials. N Engl J Med 1978; 299:690 – 694.
  3. Rosner B. Fundamentals of biostatistics. 5th ed. Pacific Grove, Calif: Duxbury, 2000; 308.
  4. Feinstein AR. Principles of medical statis- tics. Boca Raton, Fla: CRC, 2002; 503.
  5. Snedecor GW, Cochran WG. Statistical methods. 8th ed. Ames, Iowa: Iowa State University Press, 1989; 52, 439.
    1. Browner WS, Newman TB, Cummings SR, Hulley SB. Estimating sample size and power. In: Hulley SB, Cummings SR, Browner WS, Grady D, Hearst N, New- man TB. Designing clinical research: an epidemiologic approach. 2nd ed. Phila- delphia, Pa: Lippincott Williams & Wilkins, 2001; 65– 84.
    2. Frison L, Pocock S. Repeated measure- ments in clinical trials: analysis using mean summary statistics and its implica- tions for design. Stat Med 1992; 11:1685–
    3. Fleiss JL. Statistical methods for rates and proportions. 2nd ed. New York, NY: Wiley, 1981; 45.
    4. Hoenig JM, Heisey DM. The abuse of power: the pervasive fallacy of power cal- culations for data analysis. Am Stat 2001; 55:19 – 24.

Volume 227  Number 2 Sample Size Estimation  313

R

adiology