Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Statistical Analysis Methods, Study notes of Mathematical Statistics

Alliance University Mathematical Statistics

Various statistical analysis methods including descriptive statistics, percentiles, m-estimation, tests of normality, group statistics, levene test of homogeneity of variances, detrended normal plot, and boxplot. It also discusses the conditions for using these methods and their interpretations.

Typology: Study notes

2011/2012

Uploaded on 10/31/2012

sangawar 🇮🇳

4.5

(4)

118 documents

1 / 19

This page cannot be seen from the preview

Don't miss anything!

EXAMINE

Univariate Statistics

Notation

The following notation is used throughout this chapter unless otherwise noted:

Let yy

m1<

be m distinct ordered observations for the sample and cc

m1,,

be the corresponding caseweights. Then

cc c

∑

cumulative frequency up to and including yi

and

Wcc c

== =

∑

total sum of weights.

Descriptive Statistics

Minimum and Maximum

min , max==yy

Range

range =−yy

Discover Study notes of Mathematical Statistics Alliance University

Partial preview of the text

Download Statistical Analysis Methods and more Study notes Mathematical Statistics in PDF only on Docsity!

Univariate Statistics

Notation

The following notation is used throughout this chapter unless otherwise noted:

Let y 1 < K y (^) m be m distinct ordered observations for the sample and c 1 , K,cm be the corresponding caseweights. Then

cc (^) i ck k

i = = =

∑ 1

cumulative frequency up to and including yi

and

W cc (^) m ck k

m = = = =

∑ 1

total sum of weights.

Descriptive Statistics

Minimum and Maximum

min = y 1 , max=y (^) m

Range

range = y (^) m −y 1

Mean (^) b gy

c y

W

i i i

∑ 1

Confidence Interval for the Mean

lower bound SE upper bound SE

− −

y t y t

W W

α α

/ , / ,

2 1 2 1

where SE is the standard error.

Median

The median is the 50th percentile, which is calculated by the method requested. The default method is HAVERAGE.

Interquartile Range (IQR)

IQR = 75th percentile − 25th percentile, where the 75th and 25th percentiles are calculated by the method requested for percentiles.

Variance (^) e js^2

s W

ci y (^) i y i

m 2 2 1

∑ b^ g

Standard Deviation

s = s^2

5% Trimmed Mean (^) 0.

T

W

cc (^) k tc y (^) k W cc (^) k tc y (^) k c yi i i k

k 0 9 1 1 1 2

1 1 0 9 1 1 2 2 1

. =^. −^ +^ −^ −^ +

R S

U V

- − = +

− e j e j ∑

where k 1 and k 2 satisfy the following conditions

cc (^) k 1 < tc ≤ cc (^) k 1 + 1 , W − cc (^) k 2 < tc ≤ W −cck 2 − 1

and

tc = 0 05. W

Note: If k 1 + 1 = k 2 , then T0 9 (^). =y (^) k 2

Percentiles

There are five methods for computation of percentiles. Let

tc 1 = Wp , tc 2 = (^) bW + (^1) gp

where p is the requested percentile divided by 100, and k 1 and k 2 satisfy

cc tc cc cc tc cc

k k k k

1 1 2 2

1 1 2 1

Then,

tc cc c

g tc cc

tc cc c

g tc cc

k k

1 1

2 1

2 2

1 1

2 2

∗

d i

Let x be the pth percentile; the five definitions are as follows:

Waverage (Weighted Average at y (^) tc 1 )

y g g y g y g c g y g y g c

k k k k k k k

| |

∗ ∗ ∗

∗

1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1

if if and if and

e j b g

Round (Observation Closest to tc 1 )

If c (^) k 1 + 1 ≥ 1 , then

y g y g

k k

R S

∗

1 1

1 1 2 1 1 1 2

if if

If c (^) k 1 + 1 < 1 , then

y g y g

k k

R S

| T|^ +

1 1

1 1 2 1 1 1 2

if if

Empirical (Empirical Distribution Function)

y g y g

k k

R S

∗

1 1

1 1 1

if if

and

L dc L W c L W c dc

1 2 3

∗ ∗ ∗ ∗

Then for every i, i = 1 2 3, , , find hi such that

cc (^) hi ≤ Li < cchi+ 1

and

Q

a y a y a c

y a

i h i h i h

h i

i i i

| |

∗ ∗

∗

1 1

e j b g

if and

where

a L cc

a a c

i i h

i h

∗

M-Estimation (Robust Location Estimation)

The M-estimator T of location is the solution of

c y T i (^) s

i i

m ΨΨΨΨ F − HG^

I KJ^

∑ 0 1

where ΨΨΨΨ is an odd function and s is a measure of the spread.

An alternative form of M-estimation is

c y T s

y T i (^) s

i i i

m F − HG^

I KJ^

F − HG^

I KJ^

∑^ ω^0 1

where

ω u

u u

b g

After rearranging the above equation, we get

T

c y y T s

c y T s

i i

i i i

= m

F − H G I K J

F^ − H G I K J

∑

Therefore, the algorithm to find M-estimators is defined iteratively by

T

c y y T s

c y T s

i i

i k i

i i k i

(^) m

F − H G I K J

F^ − H G I K J

∑

The algorithm stops when either

Tk (^) + 1 − T (^) k ≤ ε (^) b Tk (^) + 1 +Tkg 2 , where ε = 0 005.

or the number of iterations exceeds 30.

Andrew’s Wave ( c ), c > 0

c u

u c

i i

b g =

F H

I K

| |

sin if

0 if

By default, c = 134. π

Tukey’s Biweight ( c )

ω u

u c

i i

b g =

F H G

I K J ≤

| |

2 2

2 if

By default, c = 4.685.

Tests of Normality

Shapiro-Wilk Statistic ( W )

Since the W statistic is based on the order statistics of the sample, the caseweights have to be restricted to integers. Hence, before W is calculated, all the caseweights are rounded to the closest integer and the series is expanded. Let ci^ ∗^ be the closest integer to ci ; then

cc (^) i c (^) k W cc c k

i s m k k

m ∗ ∗ =

∗ ∗

= (^) ∑ = =∑ 1 1

The original series y = (^) ly 1 , K,y (^) mq is expanded to

x = (^) ox 1 , K,x (^) wst

where

x (^) cc x (^) cc y (^) i i m i − i

∗ + =^ =^ ∗=^ =

1 1

K , 1 , K,

Then the W statistic is defined as

W

a x

x x

i i i

i i

G G

J J

∑

b g

where

a a

W

a a

a c m i W

i W

W

i i

s s

W W

i i s

i s

s s

| | |

F HG^

I KJ

−

2 2

1 1

b g db g i

db g i b g

b g

where is the c.d.f. of a standard normal distribution

log

=^ −

−

i a^ i

d i

The following formula is used to estimate the critical value Dc for probability

0.1.

D

b b ac c (^) a

F− − − H

I K

where, if W ≤ 100 ,

a W

b W

c W W

b g

2 W > 100

a W b W c

0 98 0 49

. .

The Lilliefors significance p is calculated as follows:

If D (^) a = Dc , p= 01..

If D (^) a > D (^) c , p = exp (^) {aD a^2 + bD (^) a+ c−2 3025851. }.

If D0 2 (^). ≤ D (^) a < Dc, linear interpolation between D0 2. and Dc where D0 2. is the

critical value for probability 0.2 is done.

If D (^) a > D (^) 0 2. , pis reported as > 0 2..

2 This algorithm applies to SPSS 7.0 and later releases. To learn about algorithms for previous releases, call SPSS Technical Support.

Group Statistics

Assume that there are k kb ≥ (^2) g combinations of grouping factors. For every

combination i, i = 1 2, , K ,k, let (^) o y (^) i 1 , K,yimit be the sample observations with

the corresponding caseweights (^) oc (^) i 1 , K,cimit.

Spread versus Level

If a transformation value, a, is given, the spread(s) and level(l) are defined based on the transformed data. Let x be the transformed value of y; for every i = 1 , K, k , j = 1 , K,mi

y a y ij

ij ij

= (^) a

R = S

ln if otherwise

Then the spread (^) b gsi and the level (^) b gli are respectively defined as the Interquartile

Range and the median of (^) ox (^) i 1 , K,ximit with corresponding caseweights

o^ c (^) i^1 ,^ K,cimit. However, if^ a^ is not specified, the spread and the level are natural logarithms of the Interquartile Range and of the median of the original data. Finally, the slope is the regression coefficient of s on l, which is defined as

l l s s

l l

i i i

i i

∑

d ib g

d i

1 2

In some situations, the transformations cannot be done. The spread-versus-level plot and Levene statistic will not be produced if:

a is a negative integer and at least one of the data is 0

The significance of La is calculated from the F distribution with degrees of freedom k − 1 and W − k. Groups with zero variance are included in the test.

Robust Levene’s Test of Homogeneity of Variances

With the current version of Levene’s test La, the followings can be considered as options in order to obtain robust Levene’s tests:

Levene’s test Lb based on z (^) li(b^ )^ = |xil - ~x (^) i | where ~x (^) i is the median of xil’s for group i.

Median calculation is done by the method requested. The default method is HAVERAGE. Once the ~x (^) i ’s and hence z (^) li(b^ )^ ’s are calculated, apply the formula for La, shown in the section above, to obtain Lb by replacing zil, z (^) i and z with z (^) li(^ b)^ , z (^) i(^ b)^ and z (b^ )^ respectively.

Two significances of Lb are given. One is calculated from a F-distribution with degrees of freedom k - 1 and W - k. Another is calculated from a F- distribution with degrees of freedom k - 1 and v. The value of v is given by:

v =

u v

i i

i i i

∑

where

ui = c (^) il z (^) ilb zib l

mi ( (^ )^ − (^ )) =

∑

2 1

in which

z (^) i(^ b)^ =

c z w

il il

l i

mi ( )

∑ 1

and

v (^) i = wi - 1.

Levene’s test Lc based on z (^) il(c^ )^ = |xil - Ti, .0 9 | where Ti, .0 9 is the 5% trimmed mean of xil’s for group i.

Once the Ti, .0 9 ’s and hence z (^) il(^ c)^ ’s are calculated, apply the formula of La to obtain Lc by replacing zil, z (^) i and z with z (^) li(^ c)^ , z (^) i(c^ )^ and z (^ c)^ respectively.

The significance of Lc is calculated from a F-distribution with degrees of freedom k - 1 and W - k.

Plots

Normal Probability Plot (NPPLOT)

For every distinct observation y (^) i , Riis the rank (the mean of ranks is assigned to ties). The normal score NS (^) i is calculated by

NS

R

W

i = i

F H G

I K J

ΨΨΨΨ −^1

where ΨΨΨΨ −^1 is the inverse of the standard normal cumulative distribution function. The NPPLOT is the plot of (^) b y 1 , NS (^1) g, K, (^) b y (^) m ,NSmg.

Detrended Normal Plot

The detrended normal plot is the scatterplot of (^) b y 1 , D 1 g, K, (^) b y (^) m ,Dmg, where Di is the difference between the Z-score and normal score, which is defined by

Di = Z (^) i −NSi

and

Dallal, G. E., and Wilkinson, L. 1986. An analytic approximation to the distribution of Lilliefor’s test statistic for normality. The American Statistician, 40 (4): 294–296 (Correction: 41 : 248).

Frigge, M., Hoaglin, D. C., and Iglewicz, B. 1987. Some implementations for the boxplot. In: Computer Science and Statistics Proceedings of the 19 th Symposium on the Interface, R. M. Heiberger and M. Martin, eds. Alexandria, Va.: American Statistical Association.

Glaser, R. E. (1983). Levene’s Robust Test of Homogeneity of Variances. Encyclopedia of Statistical Sciences 4. NY: Wiley, p608-610.

Hoaglin, D. C., Mosteller, F., and Tukey, J. W. 1983. Understanding robust and exploratory data analysis. New York: John Wiley & Sons, Inc.

Hoaglin, D. C., Mosteller, F., and Tukey, J. W. 1985. Exploring data tables, trends, and shapes. New York: John Wiley & Sons, Inc.

Lilliefors, H. W. 1967. On the Kolmogorov-Smirnov tests for normality with mean and variance unknown. Journal of the American Statistical Association, 62: 399 – 402.

Loh, W. Y. (1987). Some Modifications of Levene’s Test of Variance Homogeneity. Journal of Statistical Computation and Simulation, 28, p213-

Tukey, J. W. 1977. Exploratory data analysis. Reading, Mass.: Addison-Wesley.

Velleman, P. F., and Hoaglin, D. C. 1981. Applications, basics, and computing of exploratory data analysis. Boston: Duxbury Press.

Statistical Analysis Methods, Study notes of Mathematical Statistics

Related documents

Partial preview of the text

Download Statistical Analysis Methods and more Study notes Mathematical Statistics in PDF only on Docsity!

Univariate Statistics

Notation

Descriptive Statistics

W

T

W

Percentiles

Q

ω u

b g

T

T

By default, c = 134. π

ω u

Tests of Normality

∗ ∗

∗ + =^ =^ ∗=^ =

K , 1 , K,

W

W

W

W

W

W

W

W

=^ −

D

Group Statistics

Spread versus Level

Robust Levene’s Test of Homogeneity of Variances

Plots

Normal Probability Plot (NPPLOT)

NS

R

W

ΨΨΨΨ −^1

Detrended Normal Plot