Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

SAMPLE distribution STATISTICS, Schemes and Mind Maps of Statistics

National University of Modern Languages (NUML)Statistics

detail SAMPLE distribution STATISTICS

Typology: Schemes and Mind Maps

2019/2020

Uploaded on 01/05/2022

maria-khan-15 🇵🇰

5

(1)

7 documents

1 / 8

This page cannot be seen from the preview

Don't miss anything!

SAMPLE STATISTICS

A random sample of size nfrom a distribution f(x) is a set of nrandom variables

x1,x

2,...,x

nwhich are independently and identically distributed with xi∼f(x)

for all i. Thus, the joint p.d.f of the random sample is

f(x1,x

2,...,x

n)=f(x1)f(x2)···f(x2)=

n



i=1

f(xi).

A statistic is a function of the random variables of the sample, also know as the

sample points. Examples are the sample mean ¯x=xi/n and the sample variance

s2=(xi−¯x)2/n

A random sample may be regarded as a microcosm of the population from

which it is drawn. Therefore, we might attempt to estimate the moments of the

population’s p.d.f f(x) by the corresponding moments of the sample statistics.

To determine the worth of such estimates, we may determine their expected

values and their variances. Beyond ﬁnding these simple measures, we might en-

deavour to ﬁnd distributions of the statistics, which are described as their sampling

distributions.

We can show, for example, that the mean ¯xof a random sample is an unbiased

estimate of the population moment µ=E(x), since

E(¯x)=Exi

n=1

nE(xi)=n

nµ=µ.

Its variance is

V(¯x)=Vxi

n=1

n2V(xi)= n

n2σ2=σ2

n.

Here, we have used the fact that the variance of a sum of independent random

variables is the sum of their variances, since the covariances are all zero.

Observe that V(¯x)→0asn→∞. Since E(¯x)=µ, this implies that, as the

sample size increases, the estimates become increasingly concentrated around the

true population parameters. Such an estimate is said to be consistent

The sample variance, however, does not provide an unbiased estimate of σ2=

V(x), since

E(s2)=E1

n(xi−¯x)2=E1

n(xi−µ)+(µ−¯x)2

=E1

n(xi−µ)2+2(xi−µ)(µ−¯x)+(µ−¯x)2

=V(x)−2E{(¯x−µ)2}+E{(¯x−µ)2}=V(x)−V(¯x).

Here, we have used the result that

E1

n(xi−µ)(µ−¯x)=−E{(µ−¯x)2}=−V(¯x).

1

Discover Schemes and Mind Maps of Statistics National University of Modern Languages (NUML)

Partial preview of the text

Download SAMPLE distribution STATISTICS and more Schemes and Mind Maps Statistics in PDF only on Docsity!

SAMPLE STATISTICS

A random sample of size n from a distribution f (x) is a set of n random variables x 1 , x 2 ,... , xn which are independently and identically distributed with xi ∼ f (x) for all i. Thus, the joint p.d.f of the random sample is

f (x 1 , x 2 ,... , xn) = f (x 1 )f (x 2 ) · · · f (x 2 ) =

∏^ n

i=

f (xi).

A statistic is a function of the random variables of the sample, also know as the sample points. Examples are the sample mean ¯x =

xi/n and the sample variance s^2 =

(xi − x¯)^2 /n A random sample may be regarded as a microcosm of the population from which it is drawn. Therefore, we might attempt to estimate the moments of the population’s p.d.f f (x) by the corresponding moments of the sample statistics. To determine the worth of such estimates, we may determine their expected values and their variances. Beyond finding these simple measures, we might en- deavour to find distributions of the statistics, which are described as their sampling distributions. We can show, for example, that the mean ¯x of a random sample is an unbiased estimate of the population moment μ = E(x), since

E(¯x) = E

(∑ (^) x i n

n

E(xi) =

n n

μ = μ.

Its variance is

V (¯x) = V

(∑ (^) x i n

n^2

V (xi) =

n n^2

σ^2 =

σ^2 n

Here, we have used the fact that the variance of a sum of independent random variables is the sum of their variances, since the covariances are all zero. Observe that V (¯x) → 0 as n → ∞. Since E(¯x) = μ, this implies that, as the sample size increases, the estimates become increasingly concentrated around the true population parameters. Such an estimate is said to be consistent The sample variance, however, does not provide an unbiased estimate of σ^2 = V (x), since

E(s^2 ) = E

n

(xi − x¯)^2

= E

[

n

(xi − μ) + (μ − x¯)

]

= E

[

n

(xi − μ)^2 + 2(xi − μ)(μ − ¯x) + (μ − x¯)^2

]

= V (x) − 2 E{(¯x − μ)^2 } + E{(¯x − μ)^2 } = V (x) − V (¯x).

Here, we have used the result that

E

n

(xi − μ)(μ − x¯)

= −E{(μ − x¯)^2 } = −V (¯x).

It follows that

E(s^2 ) = V (x) − V (¯x) = σ^2 −

σ^2 n

= σ^2

(n − 1) n

Therefore, s^2 is a biased estimator of the population variance and, for an unbiased estimate, we should use

ˆσ^2 = s^2

n n − 1

(xi − x¯)^2 n − 1

However, s^2 is still a consistent estimator, since E(s^2 ) → σ^2 as n → ∞ and also V (s^2 ) → 0. The value of V (s^2 ) depends on the form of the underlying population distribu- tion. It would help us to know exactly how the estimates are distributed. For this, we need some assumption about the functional form of the probability distribution of the population. The assumption that the population has a normal distribution is a conventional one, in which case, the following theorem is of assistance:

Theorem. Let x 1 , x 2 ,... , xn be a random sample from the normal population N (μ, σ^2 ). Then, y =

aixi is normally distributed with E(y) =

aiE(xi) = μ

ai and V (y) =

a^2 i V (xi) = σ^2

a^2 i.

In general, any linear function of a set of normally distributed variables is itself normally distributed. Thus, for example, if x 1 , x 2 ,... , xn is a random sample from the normal population N (μ, σ^2 ), then ¯x ∼ N (μ, σ^2 /n). The general result is best expressed in terms of matrices. Let μ = [μ 1 , μ 2 ,... , μn]′^ = E(x) denote the vector of the expected values of the el- ements of x = [x 1 , x 2 ,... , xn]′^ and let Σ = [σij ; i, j = 1, 2 ,... , n] denote the matrix of their variances and covariances. If a = [a 1 , a 2 ,... , an]′^ is a constant vector of order n, then a′x ∼ N (a′μ, a′Σa) is a normally distributed random variable with a mean of E(a′x) = a′μ =

aixi

and a variance of

V (a′x) = a′Σa =

i

j

aiaj σij =

i

a^2 i σii +

i

j =i

aiaj σij.

An important case is when the vector a = [a 1 , a 2 ,... , an]′^ becomes a vector of n units, denoted by ι = [1, 1 ,... , 1]′^ and described as the summation vector. Then, if x = [x 1 , x 2 ,... , xn]′^ is the vector of a random sample with xi ∼ N (μ, σ^2 ) for all i, there is x ∼ N (μι, σ^2 In), where μι = [μ, μ,... , μ]′^ is a vector with μ repeated n times and In is an identity matrix of order n. Writing this explicitly, we have

x =

x 1 x 2 .. . xn

 ∼^ N

μ μ .. . μ

σ^2 0 · · · 0 0 σ^2 · · · 0 .. .

0 0 · · · σ^2

where the statistics under (2) and (3) are independently distributed.

Definitions.

(1) If u ∼ χ^2 (m) and v ∼ χ^2 (n) are independent chi-square variates with m and n degrees of freedom respectively, then

F =

u m

v n

∼ F (m, n),

which is the ratio of of the chi-squares divided by their respective degrees of freedom, has an F distribution of m and n degrees of freedom, denoted by F (m, n).

(2) If x ∼ N (0, 1) is a standard normal variate and if v ∼ χ^2 (n) is a chi-square variate of n degrees of freedom, and if the two variates are distributed independently, then the ratio

t = x

/√^

v n

∼ t(n)

has a t distributed of n degrees of freedom, denoted t(n).

Notice that

t^2 =

x^2 v/n

χ^2 (1) 1

χ^2 (n) n

= F (1, n).

CONFIDENCE INTERVALS

Consider a standard normal variate z ∼ N (0, 1). From the tables in the back of the book, we can find numbers a, b such that, for any Q ∈ (0, 1), there is P (a ≤ z ≤ b) = Q. The interval [a, b] is called a Q × 100% confidence interval for z. We can minimise the length of the interval by disposing it symmetrically about the expected value E(z) = 0, since z ∼ N (0, 1) is symmetrically distributed about its mean of zero. We can easily construct confidence intervals for the parameters underlying our sample statistics. Since they are concerned with fixed parameters, such confidence statements differ in a subtle way from those regarding random variables.

A confidence interval for the mean of the N (μ, σ^2 ) distribution. Let x 1 , x 2 ,... , xn be a random sample from a normal N (μ, σ^2 ) distribution. Then

¯x ∼ N

μ,

σ^2 n

and

x¯ − μ σ/

n

∼ N (0, 1).

Therefore, we can find numbers ±β such that

P

−β ≤

x¯ − μ σ/

n

≤ β

= Q.

But, the following events are equivalent:

( −β ≤

x¯ − μ σ/

n

≤ β

−β

σ √ n

≤ x¯ − μ ≤ β

σ √ n

−β

σ √ n

≤ μ − x¯ ≤ β

σ √ n

x ¯ − β

σ √ n

≤ μ ≤ x¯ + β

σ √ n

Hence

P

¯x − β

σ √ n

≤ μ ≤ ¯x + β

σ √ n

= Q.

This says that the probability that the random interval [¯x − βσ/

n, ¯x + βσ/

n] falls over the true value μ is Q. Equivalently, given a particular sample that has a mean value of ¯x, we are Q × 100% confident that μ lies in the resulting interval.

Example. Let (1.2, 3.4, 0.6, 5.6) be a random sample from a normal N (μ, σ^2 = 9) distribution. Then ¯x = 2.7 and

x¯ − μ σ/

n

7 − μ 3 / 2

∼ N (0, 1).

Hence

P

7 − μ 3 / 2

= Q,

and it follows that (0. 24 ≤ μ ≤ 5 .64) is our 95% confidence interval.

A confidence interval for μ when σ^2 is unknown. Usually, we have to es- timate σ^2. The unbiased estimate of σ^2 is ˆσ^2 =

(xi − ¯x)^2 /(n − 1). With this estimate replacing σ^2 , we have to replace the standard normal distribution, which is appropriate to

n(¯x − μ)/σ, by the t(n − 1) distribution, which is appropriate to

n(¯x − μ)/σˆ. To demonstrate this result, consider writing

¯x − μ ˆσ/

n

x ¯ − μ σ/

n

(xi − x¯)^2 σ^2 (n − 1)

and observe that we can cancel the unknown value of σ from the numerator and the denominator. Now,

(xi − x¯)^2 /σ^2 ∼ χ^2 (n − 1), so the denominator contains the root of a chi-square variate divided by its n − 1 degrees of freedom. The numerator contains a standard normal variate. That is to say, the statistic has the form of

{ N (0, 1)

χ^2 (n − 1) n − 1

∼ t(n − 1).

which are independent variates with expectations equal to the numbers of their degrees of freedom. The sum of independent chi-squares is itself a chi-square with degrees of freedom equal to the sum of those of its constituent parts. Therefore,

∑ (xi − x¯)^2 +

(yj − ¯y)^2 σ^2

∼ χ^2 (n + m − 2)

has an expected value of n + m − 2, whence

σˆ^2 =

(xi − x¯)^2 +

(yj − y¯)^2 n + m − 2

is an unbiased estimate of the variance. If we use the estimate in place of the unknown value of σ^2 , we get

(¯x − y¯) − (μx − μy ) √ ˆσ^2 n

ˆσ^2 m

(¯x − ¯y) − (μx − μy ) √ σ^2 n +^

σ^2 m

(xi − x¯)^2 +

(yj − y¯)^2 σ^2 (n + m − 2)

N (0, 1)

χ^2 (n+m−2) n+m− 2

= t(n + m − 2).

This is the basis for determining a confidence interval that uses an estimated vari- ance in place of the unknown value.

A confidence interval for the variance. If xi ∼ N (μ, σ^2 ); i = 1,... , n is a random sample, then

(xi − x¯)^2 /(n − 1) is an unbiased estimate of the variance and

(xi − ¯x)^2 /σ^2 ∼ χ^2 (n − 1). Therefore, by looking in the back of the book at the appropriate chi-square table, we can find numbers α and β such that

P

α ≤

(xi − ¯x)^2 σ^2

≤ β

= Q

for some chosen Q ∈ (0, 1). From this, it follows that

P

α

σ^2 ∑ (xi − ¯x)^2

β

= Q ⇐⇒ P

(xi − x¯)^2 β

≤ σ^2 ≤

(xi − x¯)^2 α

= Q

and the latter provides a confidence interval for σ^2. We ought to choose α and β so as to minimise the length of the interval [α−^1 , β−^1 ]. The chi-square is an asymmetric distribution, so it is tedious to do so. The distribution becomes increasingly symmetric as the sample size n increases, and so, for large values of n, we may choose α and β to demarcate equal areas within the two tails of the distribution.

The confidence interval for the ratio of two variances. Imagine a treatment that affects the variance of a normal population. We might also wish to allow for the possibility that the mean is also affected. Let xi ∼ N (μx, σ x^2 ); i = 1,... , n

be a random sample taken from the population before treatment and let yj ∼ N (μy , σ y^2 ); j = 1,... , m be a random sample taken after treatment. Then

∑ (^) (xi − x¯)^2 σ^2

∼ χ^2 (n − 1) and

∑ (^) (yj − y¯)^2 σ^2

∼ χ^2 (m − 1),

are independent chi-squared variates, and hence

F =

(xi − ¯x)^2 σ^2 x(n − 1)

(yj − y¯)^2 σ y^2 (m − 1)

∼ F (n − 1 , m − 1).

It is possible to find numbers α and β such that P (α ≤ F ≤ β) = Q, where Q ∈ (0, 1) is some chose probability value. Given such values, we may make the following probability statement:

P

α

(yj − ¯y)^2 (n − 1) ∑ (xi − ¯x)^2 (m − 1)

σ^2 y σ^2 x

≤ β

(yj − y¯)^2 (n − 1) ∑ (xi − x¯)^2 (m − 1)

SAMPLE distribution STATISTICS, Schemes and Mind Maps of Statistics

Related documents

Partial preview of the text

Download SAMPLE distribution STATISTICS and more Schemes and Mind Maps Statistics in PDF only on Docsity!

SAMPLE STATISTICS

= E

[

]

= E

[

]

E

 ∼^ N

F =

/√^

CONFIDENCE INTERVALS

∼ N (0, 1).

P

= Q.

= Q.

∼ N (0, 1).

= Q,

N (0, 1)

P

= Q

P

= Q ⇐⇒ P

= Q

F =

P

= Q.