Statistics: Populations, Samples, Models, and Data Analysis, Study notes of Mathematical Statistics

An introduction to statistics, including the concepts of populations, samples, models, and data analysis. It covers various statistical methods such as descriptive statistics, parametric and nonparametric families, and semi-parametric models. The document also discusses the importance of understanding the distribution of data and transformations to find approximations of distributions.

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-51g-1
koofers-user-51g-1 🇺🇸

5

(1)

10 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Leture 17: Populations, samples, models, and statistis
One or a series of random exp eriments is p erformed.
Some data from the experiment(s) are olleted.
Planning experiments and olleting data (not disussed in the textbo ok).
Data analysis: extrat information from the data, interpret the results, and draw some
onlusions.
A desriptive data analysis: summary measures of the data, suh as the mean, median,
range, standard deviation, et., and some graphial displays, suh as the histogram and
box-and-whisker diagram, et.
It is simple and requires almost no assumptions, but may not allow us to gain enough insight
into the problem.
We fous on more sophistiated methods of analyzing data:
statistial inferene
and
deision
theory
.
The data set is a realization of a random element dened on a probability spae (
;
F
; P
)
P
is alled the
population
.
The data set or the random element that produes the data is alled a
sample
from
P
.
The size of the data set is alled the
sample size
.
A population
P
is
known
if and only if
P
(
A
) is a known value for every event
A
2 F
.
In a statistial problem, the population
P
is at least partially unknown.
We would like to dedue some prop erties of
P
based on the available sample.
Examples 2.1-2.3
A
statistial model
(a set of assumptions) on the p opulation
P
in a given problem is often
postulated to make the analysis possible or easy.
Although testing the orretness of postulated mo dels is part of statistial inferene and
deision theory, postulated models are often based on knowledge of the problem under on-
sideration.
Denition 2.1.
A set of probability measures
P
on (
;
F
) indexed by a
parameter
2
is said to be a
parametri family
if and only if
R
d
for some xed positive integer
d
and
eah
P
is a
known
probability measure when
is known. The set is alled the
parameter
spae
and
d
is alled its
dimension
.
Parametri model: the population
P
is in a parametri family
P
=
f
P
:
2
g
P
=
f
P
:
2
g
is
identiable
if and only if
1
6
=
2
and
i
2
imply
P
1
6
=
P
2
.
In most ases an identiable parametri family an be obtained through reparameterization.
A family of populations
P
is dominated by
(a
-nite measure) if
P
for all
P
2 P
P
an be identied by the family of densities
f
dP
d
:
P
2 Pg
or
f
dP
d
:
2
g
.
Parametri methods: methods designed for parametri mo dels
1
pf3
pf4

Partial preview of the text

Download Statistics: Populations, Samples, Models, and Data Analysis and more Study notes Mathematical Statistics in PDF only on Docsity!

Le ture 17: Populations, samples, mo dels, and statisti s

One or a series of random exp eriments is p erformed. Some data from the exp eriment(s) are olle ted. Planning exp eriments and olle ting data (not dis ussed in the textb o ok). Data analysis: extra t information from the data, interpret the results, and draw some on lusions.

A des riptive data analysis: summary measures of the data, su h as the mean, median, range, standard deviation, et ., and some graphi al displays, su h as the histogram and b ox-and-whisker diagram, et.

It is simple and requires almost no assumptions, but may not allow us to gain enough insight into the problem. We fo us on more sophisti ated metho ds of analyzing data: statisti al inferen e and de ision theory.

The data set is a realization of a random element de ned on a probability spa e ( ; F ; P ) P is alled the population. The data set or the random element that pro du es the data is alled a sample from P. The size of the data set is alled the sample size. A p opulation P is known if and only if P (A) is a known value for every event A 2 F. In a statisti al problem, the p opulation P is at least partially unknown. We would like to dedu e some prop erties of P based on the available sample.

Examples 2.1-2.

A statisti al model (a set of assumptions) on the p opulation P in a given problem is often p ostulated to make the analysis p ossible or easy. Although testing the orre tness of p ostulated mo dels is part of statisti al inferen e and de ision theory, p ostulated mo dels are often based on knowledge of the problem under on- sideration.

De nition 2.1. A set of probability measures P on ( ; F ) indexed by a parameter  2  is said to b e a parametri family if and only if   Rd^ for some xed p ositive integer d and ea h P is a known probability measure when  is known. The set  is alled the parameter spa e and d is alled its dimension.

Parametri mo del: the p opulation P is in a parametri family P = fP :  2 g P = fP :  2 g is identi able if and only if  1 6 =  2 and i 2  imply P 1 6 = P 2. In most ases an identi able parametri family an b e obtained through reparameterization.

A family of p opulations P is dominated by  (a  - nite measure) if P   for all P 2 P P an b e identi ed by the family of densities f dPd : P 2 P g or f dPd :  2 g.

Parametri metho ds: metho ds designed for parametri mo dels

Example (The k -dimensional normal family).

P = fNk (; ) :  2 Rk^ ;  2 Mk g;

where Mk is a olle tion of k  k symmetri p ositive de nite matri es. This family is dominated by the Leb esgue measure on Rk^. When k = 1, P = fN (;  2 ) :  2 R;  2 > 0 g.

Nonparametri family: P is not parametri a ording to De nition 2.1. A nonparametri mo del: the p opulation P is in a given nonparametri family.

Examples of nonparametri family on (Rk^ ; B k^ ): (1) The joint .d.f.'s are ontinuous. (2) The joint .d.f.'s have nite moments of order  a xed integer. (3) The joint .d.f.'s have p.d.f.'s (e.g., Leb esgue p.d.f.'s). (4) k = 1 and the .d.f.'s are symmetri. (5) The family of all probability measures on (Rk^ ; B k^ ).

Nonparametri metho ds: metho ds designed for nonparametri mo dels

Semi-parametri mo dels and metho ds

Statisti s and their distributions

Our data set is a realization of a sample (random ve tor) X from an unknown p opulation P Statisti T (X ): A measurable fun tion T of X ; T (X ) is a known value whenever X is known. Statisti al analyses are based on various statisti s, for various purp oses. X itself is a statisti , but it is a trivial statisti. The range of a nontrivial statisti T (X ) is usually simpler than that of X. For example, X may b e a random n-ve tor and T (X ) may b e a random p-ve tor with a p mu h smaller than n.  (T (X ))   (X ) and the two  - elds are the same if and only if T is one-to-one. Usually  (T (X )) simpli es  (X ), i.e., a statisti provides a \redu tion" of the  - eld.

The \information" within the statisti T (X ) on erning the unknown distribution of X is ontained in the  - eld  (T (X )). S is any other statisti for whi h  (S (X )) =  (T (X )). Then, by Lemma 1.2, S is a measurable fun tion of T , and T is a measurable fun tion of S. Thus, on e the value of S (or T ) is known, so is the value of T (or S ). It is not the parti ular values of a statisti that ontain the information, but the generated  - eld of the statisti. Values of a statisti may b e imp ortant for other reasons.

A statisti T (X ) is a random element. If the distribution of X is unknown, then the distribution of T may also b e unknown, although T is a known fun tion. Finding the form of the distribution of T is one of the ma jor problems in statisti al inferen e and de ision theory.

Y 1 ; :::; Yn are i.i.d. random 2-ve tors with E Y 1 = (0;  2 ) and varian e- ovarian e matrix

B ^

2 E (X 1 ) 3

E (X 1 )^3 E (X 1 )^4  4

CA :

Note that Y = n^1

Pn i=1 Yi^ =^ (^ X^ ^ ;^ S~^ (^2) ), where S~ 2 = n 1 Pn i=1 (Xi^ ^ )

Applying the CLT (Corollary 1.2) to Yi 's, we obtain that

p n( X ; S~ 2  2 ) !d N 2 (0; ):

Sin e S 2 =

n n 1

h S~ 2 ( X )^2

i

and X !a:s:  (the SLLN), an appli ation of Slutsky's theorem leads to

p n( X ; S 2  2 ) !d N 2 (0; ):

Example 2.9 (Order statisti s). Let X = (X 1 ; :::; Xn ) with i.i.d. random omp onents. Let X(i) b e the ith smallest value of X 1 ; :::; Xn. The statisti s X(1) ; :::; X(n) are alled the order statisti s. Order statisti s is a set of very useful statisti s in addition to the sample mean and varian e. Supp ose that Xi has a .d.f. F having a Leb esgue p.d.f. f. Then the joint Leb esgue p.d.f. of X(1) ; :::; X(n) is

g (x 1 ; x 2 ; :::; xn ) =

n!f (x 1 )f (x 2 )    f (xn ) x 1 < x 2 <    < xn 0 otherwise.

The joint Leb esgue p.d.f. of X(i) and X(j ) , 1  i < j  n, is

gi;j (x; y ) =

n![F (x)℄i^1 [F (y )F (x)℄j^ i^1 [1F (y )℄nj^ f (x)f (y ) (i1)!(j i1)!(nj )! x^ <^ y 0 otherwise

and the Leb esgue p.d.f. of X(i) is

gi (x) =

n! (i 1)!(n i)!

[F (x)℄i^1 [1 F (x)℄ni^ f (x):