


Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An introduction to statistics, including the concepts of populations, samples, models, and data analysis. It covers various statistical methods such as descriptive statistics, parametric and nonparametric families, and semi-parametric models. The document also discusses the importance of understanding the distribution of data and transformations to find approximations of distributions.
Typology: Study notes
1 / 4
This page cannot be seen from the preview
Don't miss anything!



Le ture 17: Populations, samples, mo dels, and statisti s
One or a series of random exp eriments is p erformed. Some data from the exp eriment(s) are olle ted. Planning exp eriments and olle ting data (not dis ussed in the textb o ok). Data analysis: extra t information from the data, interpret the results, and draw some on lusions.
A des riptive data analysis: summary measures of the data, su h as the mean, median, range, standard deviation, et ., and some graphi al displays, su h as the histogram and b ox-and-whisker diagram, et.
It is simple and requires almost no assumptions, but may not allow us to gain enough insight into the problem. We fo us on more sophisti ated metho ds of analyzing data: statisti al inferen e and de ision theory.
The data set is a realization of a random element de ned on a probability spa e ( ; F ; P ) P is alled the population. The data set or the random element that pro du es the data is alled a sample from P. The size of the data set is alled the sample size. A p opulation P is known if and only if P (A) is a known value for every event A 2 F. In a statisti al problem, the p opulation P is at least partially unknown. We would like to dedu e some prop erties of P based on the available sample.
Examples 2.1-2.
A statisti al model (a set of assumptions) on the p opulation P in a given problem is often p ostulated to make the analysis p ossible or easy. Although testing the orre tness of p ostulated mo dels is part of statisti al inferen e and de ision theory, p ostulated mo dels are often based on knowledge of the problem under on- sideration.
De nition 2.1. A set of probability measures P on ( ; F ) indexed by a parameter 2 is said to b e a parametri family if and only if Rd^ for some xed p ositive integer d and ea h P is a known probability measure when is known. The set is alled the parameter spa e and d is alled its dimension.
Parametri mo del: the p opulation P is in a parametri family P = fP : 2 g P = fP : 2 g is identi able if and only if 1 6 = 2 and i 2 imply P 1 6 = P 2. In most ases an identi able parametri family an b e obtained through reparameterization.
A family of p opulations P is dominated by (a - nite measure) if P for all P 2 P P an b e identi ed by the family of densities f dPd : P 2 P g or f dPd : 2 g.
Parametri metho ds: metho ds designed for parametri mo dels
Example (The k -dimensional normal family).
P = fNk (; ) : 2 Rk^ ; 2 Mk g;
where Mk is a olle tion of k k symmetri p ositive de nite matri es. This family is dominated by the Leb esgue measure on Rk^. When k = 1, P = fN (; 2 ) : 2 R; 2 > 0 g.
Nonparametri family: P is not parametri a ording to De nition 2.1. A nonparametri mo del: the p opulation P is in a given nonparametri family.
Examples of nonparametri family on (Rk^ ; B k^ ): (1) The joint .d.f.'s are ontinuous. (2) The joint .d.f.'s have nite moments of order a xed integer. (3) The joint .d.f.'s have p.d.f.'s (e.g., Leb esgue p.d.f.'s). (4) k = 1 and the .d.f.'s are symmetri. (5) The family of all probability measures on (Rk^ ; B k^ ).
Nonparametri metho ds: metho ds designed for nonparametri mo dels
Semi-parametri mo dels and metho ds
Statisti s and their distributions
Our data set is a realization of a sample (random ve tor) X from an unknown p opulation P Statisti T (X ): A measurable fun tion T of X ; T (X ) is a known value whenever X is known. Statisti al analyses are based on various statisti s, for various purp oses. X itself is a statisti , but it is a trivial statisti. The range of a nontrivial statisti T (X ) is usually simpler than that of X. For example, X may b e a random n-ve tor and T (X ) may b e a random p-ve tor with a p mu h smaller than n. (T (X )) (X ) and the two - elds are the same if and only if T is one-to-one. Usually (T (X )) simpli es (X ), i.e., a statisti provides a \redu tion" of the - eld.
The \information" within the statisti T (X ) on erning the unknown distribution of X is ontained in the - eld (T (X )). S is any other statisti for whi h (S (X )) = (T (X )). Then, by Lemma 1.2, S is a measurable fun tion of T , and T is a measurable fun tion of S. Thus, on e the value of S (or T ) is known, so is the value of T (or S ). It is not the parti ular values of a statisti that ontain the information, but the generated - eld of the statisti. Values of a statisti may b e imp ortant for other reasons.
A statisti T (X ) is a random element. If the distribution of X is unknown, then the distribution of T may also b e unknown, although T is a known fun tion. Finding the form of the distribution of T is one of the ma jor problems in statisti al inferen e and de ision theory.
Y 1 ; :::; Yn are i.i.d. random 2-ve tors with E Y 1 = (0; 2 ) and varian e- ovarian e matrix
Note that Y = n ^1
Pn i=1 Yi^ =^ (^ X^ ^ ;^ S~^ (^2) ), where S~ 2 = n 1 Pn i=1 (Xi^ ^ )
Applying the CLT (Corollary 1.2) to Yi 's, we obtain that
p n( X ; S~ 2 2 ) !d N 2 (0; ):
Sin e S 2 =
n n 1
h S~ 2 ( X )^2
i
and X !a:s: (the SLLN), an appli ation of Slutsky's theorem leads to
p n( X ; S 2 2 ) !d N 2 (0; ):
Example 2.9 (Order statisti s). Let X = (X 1 ; :::; Xn ) with i.i.d. random omp onents. Let X(i) b e the ith smallest value of X 1 ; :::; Xn. The statisti s X(1) ; :::; X(n) are alled the order statisti s. Order statisti s is a set of very useful statisti s in addition to the sample mean and varian e. Supp ose that Xi has a .d.f. F having a Leb esgue p.d.f. f. Then the joint Leb esgue p.d.f. of X(1) ; :::; X(n) is
g (x 1 ; x 2 ; :::; xn ) =
n!f (x 1 )f (x 2 ) f (xn ) x 1 < x 2 < < xn 0 otherwise.
The joint Leb esgue p.d.f. of X(i) and X(j ) , 1 i < j n, is
gi;j (x; y ) =
n![F (x)℄i ^1 [F (y ) F (x)℄j^ i ^1 [1 F (y )℄n j^ f (x)f (y ) (i 1)!(j i 1)!(n j )! x^ <^ y 0 otherwise
and the Leb esgue p.d.f. of X(i) is
gi (x) =
n! (i 1)!(n i)!
[F (x)℄i ^1 [1 F (x)℄n i^ f (x):