Testing for Normality, Lecture notes of Statistics

where D is the test statistic, SS is the sum of squares of the data and n is the sample size, and i is the order or rank of observation x. The df for this test ...

Typology: Lecture notes

2021/2022

Uploaded on 09/27/2022

ahalya
ahalya 🇺🇸

4.9

(16)

257 documents

1 / 32

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Testing for Normality
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20

Partial preview of the text

Download Testing for Normality and more Lecture notes Statistics in PDF only on Docsity!

Testing for Normality

For each mean and standard deviation combination a theoretical

normal distribution can be determined. This distribution is based

on the proportions shown below.

There are several methods of assessing whether data are

normally distributed or not. They fall into two broad categories:

graphical and statistical. The some common techniques are:

Graphical

  • Q-Q probability plots
  • Cumulative frequency (P-P) plots

Statistical

  • W/S test
  • Jarque-Bera test
  • Shapiro-Wilks test
  • Kolmogorov-Smirnov test
  • D’Agostino test

Q-Q plots display the observed values against normally distributed

data (represented by the line).

Normally distributed data fall along the line.

Tests of Normality

Age .110 1048 .000 .931 1048.

Statistic df Sig. Statistic df Sig.

Kolmogorov-Smirnova^ Shapiro-Wilk

a. Lilliefors Significance Correction

Tests of Normality

TOTAL_VALU .283 149 .000 .463 149.

Statistic df Sig. Statistic df Sig.

Kolmogorov-Smirnova^ Shapiro-Wilk

a. Lilliefors Significance Correction

Tests of Normality

Z100 .071 100 .200* .985 100.

Statistic df Sig. Statistic df Sig.

Kolmogorov-Smirnova^ Shapiro-Wilk

*. This is a lower bound of the true s ignificance. a. Lilliefors Significance Correction

Statistical tests for normality are more precise since actual

probabilities are calculated.

Tests for normality calculate the probability that the sample was

drawn from a normal population.

The hypotheses used are:

Ho: The sample data are not significantly different than a normal

population.

Ha: The sample data are significantly different than a normal

population.

Non-Normally Distributed Data

Average PM10 .142 72 .001 .841 72.

Statistic df Sig. Statistic df Sig.

Kolmogorov-Smirnov a^ Shapiro-Wilk

a. Lilliefors Significance Correction

Normally Distributed Data

As thma Cases .069 72 .200* .988 72.

Statistic df Sig. Statistic df Sig.

Kolmogorov-Smirnova^ Shapiro-Wilk

*. This is a lower bound of the true s ignificance. a. Lilliefors Significance Correction

In SPSS output above the probabilities are greater than 0.05 (the typical alpha

level), so we accept Ho … these data are not different from normal.

In the SPSS output above the probabilities are less than 0.05 (the typical alpha

level), so we reject Ho … these data are significantly different from normal.

Simple Tests for Normality

Range constant,
SD changes
Range changes,
SD constant

The W/S test uses a critical range. If the calculated

value falls within the range, then accept Ho. If the

calculated value falls outside the range then reject Ho.

Since 3.34 < q=4.05 < 4.71, we accept Ho.

Nurio 6.

Since we have a critical range, it is difficult to determine a

probability range for our results. Therefore we simply state our

alpha level.

The sample data set is not significantly different than normal (q 4.05,

p > 0.05).

D’Agostino Test

  • A very powerful test for departures from normality.
  • Based on the D statistic, which gives an upper and lower critical

value.

where D is the test statistic, SS is the sum of squares of the data

and n is the sample size, and i is the order or rank of observation

x. The df for this test is n (sample size).

  • First the data are ordered from smallest to largest or largest to

smallest.

Use the next lower

n on the table if the

sample size is NOT

listed.

This is the ‘middle’ of the data set.
This is the observation’s distance from the middle.
This is the observation, and is used to ‘weight’ the result
based on the size of the observation and its distance.

Breaking down the equations:

This represents which tail is more pronounced (- for left, + for right).
This adjusts for sample size like this:
This is the dataset’s total squared variation.
This transforms the squared values from SS.