Docsity
Docsity

Prepare-se para as provas
Prepare-se para as provas

Estude fácil! Tem muito documento disponível na Docsity


Ganhe pontos para baixar
Ganhe pontos para baixar

Ganhe pontos ajudando outros esrudantes ou compre um plano Premium


Guias e Dicas
Guias e Dicas


Lectures Notes on Statistical Inference, Notas de estudo de Estatística

Texto de Inferencia Estatistica

Tipologia: Notas de estudo

2020

Compartilhado em 10/01/2020

maria-teresa-afonso-perucho
maria-teresa-afonso-perucho 🇧🇷

5

(1)

7 documentos

1 / 114

Toggle sidebar

Esta página não é visível na pré-visualização

Não perca as partes importantes!

bg1
LECTURE NOTES ON
STATISTICAL INFERENCE
KRZ YS ZTO F POD G ´
ORSKI
Department of Mathematics and Statistics
University of Limerick, Ireland
November 23, 2009
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Pré-visualização parcial do texto

Baixe Lectures Notes on Statistical Inference e outras Notas de estudo em PDF para Estatística, somente na Docsity!

LECTURE NOTES ON

STATISTICAL INFERENCE

KRZYSZTOF PODG ORSKI´

Department of Mathematics and Statistics

University of Limerick, Ireland

November 23, 2009

Contents

  • 1 Introduction
    • 1.1 Models of Randomness and Statistical Inference
    • 1.2 Motivating Example
      • 1.2.1 Probability vs. likelihood
      • 1.2.2 More data
    • 1.3 Likelihood and theory of statistics
    • 1.4 Computationally intensive methods of statistics - puter generated random samples 1.4.1 Monte Carlo methods – studying statistical methods using com-
      • 1.4.2 Bootstrap – performing statistical inference using computers
  • 2 Review of Probability
    • 2.1 Expectation and Variance
    • 2.2 Distribution of a Function of a Random Variable
      • ment Generating Functions 2.3 Transforms Method Characteristic, Probability Generating and Mo-
    • 2.4 Random Vectors
      • 2.4.1 Sums of Independent Random Variables
      • 2.4.2 Covariance and Correlation
      • 2.4.3 The Bivariate Change of Variables Formula
    • 2.5 Discrete Random Variables
      • 2.5.1 Bernoulli Distribution
      • 2.5.2 Binomial Distribution
      • 2.5.3 Negative Binomial and Geometric Distribution
      • 2.5.4 Hypergeometric Distribution
      • 2.5.5 Poisson Distribution
      • 2.5.6 Discrete Uniform Distribution
      • 2.5.7 The Multinomial Distribution
    • 2.6 Continuous Random Variables
      • 2.6.1 Uniform Distribution
      • 2.6.2 Exponential Distribution
      • 2.6.3 Gamma Distribution
      • 2.6.4 Gaussian (Normal) Distribution
      • 2.6.5 Weibull Distribution
      • 2.6.6 Beta Distribution
      • 2.6.7 Chi-square Distribution
      • 2.6.8 The Bivariate Normal Distribution
      • 2.6.9 The Multivariate Normal Distribution
    • 2.7 Distributions – further properties
      • 2.7.1 Sum of Independent Random Variables – special cases
      • 2.7.2 Common Distributions – Summarizing Tables
  • 3 Likelihood
    • 3.1 Maximum Likelihood Estimation
    • 3.2 Multi-parameter Estimation
    • 3.3 The Invariance Principle
  • 4 Estimation
    • 4.1 General properties of estimators
    • 4.2 Minimum-Variance Unbiased Estimation
    • 4.3 Optimality Properties of the MLE
  • 5 The Theory of Confidence Intervals
    • 5.1 Exact Confidence Intervals
    • 5.2 Pivotal Quantities for Use with Normal Data
    • 5.3 Approximate Confidence Intervals
  • 6 The Theory of Hypothesis Testing
    • 6.1 Introduction
    • 6.2 Hypothesis Testing for Normal Data
    • 6.3 Generally Applicable Test Procedures
    • 6.4 The Neyman-Pearson Lemma
    • 6.5 Goodness of Fit Tests
    • 6.6 The χ^2 Test for Contingency Tables

Chapter 1

Introduction

Everything existing in the universe is the fruit of chance. Democritus, the 5th Century BC

1.1 Models of Randomness and Statistical Inference

Statistics is a discipline that provides with a methodology allowing to make an infer- ence from real random data on parameters of probabilistic models that are believed to generate such data. The position of statistics with relation to real world data and corre- sponding mathematical models of the probability theory is presented in the following diagram. The following is the list of few from plenty phenomena to which randomness is attributed.

  • Games of chance
    • Tossing a coin
    • Rolling a die
    • Playing Poker
  • Natural Sciences
  • Physics (notable Quantum Physics)
  • Genetics
  • Climate
  • Engineering
  • Risk and safety analysis
  • Ocean engineering
  • Economics and Social Sciences
  • Currency exchange rates
  • Stock market fluctations
  • Insurance claims
  • Polls and election results
  • etc.

1.2 Motivating Example

Let X denote the number of particles that will be emitted from a radioactive source in the next one minute period. We know that X will turn out to be equal to one of the non-negative integers but, apart from that, we know nothing about which of the possible values are more or less likely to occur. The quantity X is said to be a random variable. Suppose we are told that the random variable X has a Poisson distribution with parameter θ = 2. Then, if x is some non-negative integer, we know that the probability that the random variable X takes the value x is given by the formula

P (X = x) = θ

x (^) exp (−θ) x!

where θ = 2. So, for instance, the probability that X takes the value x = 4 is

P (X = 4) =^2

(^4) exp (−2) 4! = 0.^0902.

We have here a probability model for the random variable X. Note that we are using upper case letters for random variables and lower case letters for the values taken by random variables. We shall persist with this convention throughout the course. Let us still assume that the random variable X has a Poisson distribution with parameter θ but where θ is some unspecified positive number. Then, if x is some non- negative integer, we know that the probability that the random variable X takes the value x is given by the formula

P (X = x|θ) = θ

x (^) exp (−θ) x! ,^ (1.1)

for θ ∈ R+. However, we cannot calculate probabilities such as the probability that X takes the value x = 4 without knowing the value of θ. Suppose that, in order to learn something about the value of θ, we decide to measure the value of X for each of the next 5 one minute time periods. Let us use the notation X 1 to denote the number of particles emitted in the first period, X 2 to denote the number emitted in the second period and so forth. We shall end up with data consisting of a random vector X = (X 1 , X 2 ,... , X 5 ). Consider x = (x 1 , x 2 , x 3 , x 4 , x 5 ) = (2, 1 , 0 , 3 , 4). Then x is a possible value for the random vector X. We know that the probability that X 1 takes the value x 1 = 2 is given by the formula

P (X = 2|θ) = θ

(^2) exp (−θ) 2!

and similarly that the probability that X 2 takes the value x 2 = 1 is given by

P (X = 1|θ) = θ^ exp (1!−θ)

and so on. However, what about the probability that X takes the value x? In order for this probability to be specified we need to know something about the joint distribution of the random variables X 1 , X 2 ,... , X 5. A simple assumption to make is that the ran- dom variables X 1 , X 2 ,... , X 5 are mutually independent. (Note that this assumption may not be correct since X 2 may tend to be more similar to X 1 that it would be to X 5 .) However, with this assumption we can say that the probability that X takes the value x

0 2 4 6 8 10

Number of particles

Probability

Figure 1.2: Probability mass function for Poisson model with θ = 2.

that such a model is correct. However, we have arbitrarily set θ = 2 and this is more questionable. How can we know that it is correct a correct value of the parameter? Let us analyze this issue in detail. If x is some non-negative integer, we know that the probability that the random variable X takes the value x is given by the formula

P (X = x|θ) = θ

xe−θ x! ,

for θ > 0. But without knowing the true value of θ, we cannot calculate probabilities such as the probability that X takes the value x = 1. Suppose that, in order to learn something about the value of θ, an experiment is performed and a value of X = 5 is recorded. Let us take a look at the probability mass function for θ = 2 in Figure 1.2. What is the probability of X to take value 2? Do we like what we see? Why? Would you bet 1 or 2 in the next experiment? We certainly have some serious doubt about our choice of θ = 2 which was arbi- trary anyway. One can consider, for example, θ = 7 as an alternative to θ = 2. Here are graphs of the pmf for the two cases. Which of the two choices do we like? Since it

0.00 0 2 4 6 8 10

0.^ 0.^ 0.^

Number of particles

Probability

0.00 0 2 4 6 8 10

0.^ 0.^

Number of particles

Probability

Figure 1.3: The probability mass function for Poisson model with θ = 2 vs. the one with θ = 7.

was more probable to get X = 5 under the assumption θ = 7 than when θ = 2, we say θ = 7 is more likely to produce X = 5 than θ = 2. Based on this observation we can develop a general strategy for chosing θ. Let us summarize our position. So far we know (or assume) about the radioactive emission that it follows Poisson model with some unknown θ > 0 and the value x = 5 has been once observed. Our goal is somehow to utilized this knowledge. First, we note that the Poisson model is in fact not only a function of x but also of θ

p(x|θ) = θ

xe−θ x!.

Let us plug in the observed x = 5, so that we get a function of θ that is called likelihood function l(θ) = θ

(^5) e−θ

The graph of it is presented on the next figure. Can you localize on this graph the values of probabilities that were used to chose θ = 7 over θ = 2? What value of θ appears to be the most preferable if the same argument is extended to all possible values of θ? We observe that the value of θ = 5 is most likely to produce value x = 5. In the result of our likelihood approach we have used the data x = 5 and the Poisson model to make inference - an example of statistical inference.

Exercise 1. For the general Poisson model

p(x|θ) = l(θ|x) = θ

xe−θ x! ,

  1. for a given θ > find the most probable value of the observation x.
  2. for a given observation x find the most likely value of θ.

Give a mathematical argument for your claims.

1.2.2 More data

Suppose that we perform another measurement of the number of emitted particles. Let us use the notation X 1 to denote the number of particles emitted in the first period, X 2 to denote the number emitted in the second period. We shall end up with data consisting of a random vector X = (X 1 , X 2 ). The second measurement yielded x 2 = 2, so that x = (x 1 , x 2 ) = (5, 2). We know that the probability that X 1 takes the value x 1 = 5 is given by the formula

P (X = 5|θ) = θ

(^5) e−θ 5!

and similarly that the probability that X 2 takes the value x 2 = 2 is given by

P (X = 2|θ) = θ

(^2) e−θ 2!.

However, what about the probability that X takes the value x = (5, 2)? In order for this probability to be specified we need to know something about the joint distribution of the random variables X 1 , X 2. A simple assumption to make is that the random variables X 1 , X 2 are mutually independent. In such a case the probability that X takes the value x = (x 1 , x 2 ) is given by

P (X = (x 1 , x 2 )|θ) = θ

x (^1) e−θ x 1! ·^

θx^2 e−θ x 2! =^ e

− 2 θ θx^1 +x^2 x 1 !x 2!. After little of algebra we easily find the likelihood function of observing X = (5, 2) as l(θ|(5, 2)) = e−^2 θ^ θ

7 240

0 5 10 15

theta

Likelihood

0 5 10 15

theta

Likelihood

Figure 1.5: Likelihood of observing (5, 2) (top) vs. the one of observing 5 (bottom).

and its graph is presented in Figure 1.5 in comparison with the previous likelihood for a single observation. Two important effects of adding an extra information should be noted

  • We observe that the location of the maximum shifted from 5 to 3 compared to single observation.
  • We also note that the range of likely values for θ has diminished. Let us suppose that eventually we decide to measure three more values of X. Let us use the vector notation X = (X 1 , X 2 ,... , X 5 ) to denote observable random

1.3 Likelihood and theory of statistics

The strategy of making statistical inference based on the likelihood function as de- scribed above is the recurrent theme in mathematical statistics and thus in our lecture. Using mathematical argument we would compare various strategies to infering about the parameters and often we will demonstrate that the likelihood based methods are optimal. It will show its strength also as a criterium deciding between various claims about parameters of the model which is the leading story of testing hypotheses. In the modern days, the role of computers has increased in statistical methodology. New computationally intense methods of data explorations become one of the central areas of modern statistcs. Even there, methods that refer to likelihood play dominant roles, in particular, in Bayesian methodology. Despite this extensive penetration of statistical methodology by likelihood techin- ques, by no means statistics can be reduced to analysis of likelihood. In every area of statistics, there are important aspects that require reaching beyond likelihood, in many cases, likelihood is not even a focus of studies and development. The purpose of this course is to present both the importance of likelihood approach across statistics but also presentation of topics for which likelihood plays a secondary role if any.

1.4 Computationally intensive methods of statistics

The second part of our presentation of modern statistical inference is devoted to compu- tationally intensive statistical methods. The area of data explorations is rapidly growing in importance due to

  • common access to inexpensive but advance computing tools,
  • emerging of new challenges associated with massive highly dimensional data far exceeding traditional assumptions on which traditional methods of statistics have been based.

In this introduction we give two examples that illustrate the power of modern computers and computing software both in analysis of statistical models and in performing actual

statistical inference. We start with analyzing a performance of a statistical procedure using random sample generation.

1.4.1 Monte Carlo methods – studying statistical methods using

computer generated random samples

Randomness can be used to study properties of a mathematical model. The model itself may be probabilistic or not but here we focus on the probabilistic ones. Essentially, it is based on repetitive simulations of random samples corresponding to the model and observing behavior of objects of interests. An example of Monte Carlo method is ap- proximate the area of circle by tossing randomly a point (typically computer generated) on the paper where a circle is drawn. The percentage of points that fall inside the circle represents (approximately) percentage of the area covered by the circle, as illustrated in Figure 1.6.

Exercise 4. Write an R code that would explore the area of an elipsoid using Monte Carlo method.

Below we present an application of Monte Carlo approach to studying fitting meth- ods for the Poisson model.

Deciding for Poisson model

Recall that the Poisson model is given by

P (X = x|θ) = θ

xe−θ x!.

It is relatively easy to demonstrate that the mean value of this distribution is equal to θ and standard deviation is also equal to θ.

Exercise 5. Present a formal argument showing that for a Poisson random variable X with parameter θ, EX = θ and VarX = θ.

Thus for a sample of observations x = (x 1 ,... , xn) it is reasonable to consider

Histogram of means

means

Frequency 2.5 3.0 3.5 4.0 4.5 5.0 5.

0

100

Histogram of vars

vars

Frequency 0 5 10 15

0

150

300

Figure 1.7: Monte Carlo results of comparing estimation of θ = 4 by the sample mean (left) vs. estimation using the sample standard deviation right.

estimates performs better. The resulting histograms of the values of estimator are pre- sented in Figure 1.8. It is quite clear from the graphs that the estimator based on the mean is better than the one based on the variance.

1.4.2 Bootstrap – performing statistical inference using computers

Bootstrap (resampling) methods are one of the examples of Monte Carlo based statis- tical analysis. The methodology can be summarized as follows

  • Collect statistical sample, i.e. the same type of data as in classical statistics.
  • Used a properly chosen Monte Carlo based resampling from the data using RNG
    • create so called bootstrap samples.
  • Analyze bootstrap samples to draw conclusions about the random mechanism

that produced the original statistical data.

This way randomness is used to analyze statistical samples that, by the way, are also a result of randomness. An example illustrating the approach is presented next.

Estimating nitrate ion concentration

Nitrate ion concentration measurements in a certain chemical lab has been collected and their results are given in the following table. The goal is to estimate, based on

0.51 0.51 0.51 0.50 0.51 0.49 0.52 0.53 0.50 0. 0.51 0.52 0.53 0.48 0.49 0.50 0.52 0.49 0.49 0. 0.49 0.48 0.46 0.49 0.49 0.48 0.49 0.49 0.51 0. 0.51 0.51 0.51 0.48 0.50 0.47 0.50 0.51 0.49 0. 0.51 0.50 0.50 0.53 0.52 0.52 0.50 0.50 0.51 0.

Table 1.1: Results of 50 determinations of nitrate ion concentration in μg per ml.

these values, the actual nitrate ion concentration. The overall mean of all observations is 0.4998. It is natural to ask what is the error of this determination of the nitrate concentration. If we would repeat our experiment of collecting 50 samples of nitrate concentrations many times we would see the range of error that is made. However, it would be a waste of resources and not a viable method at all. Instead we resample ‘new’ data from our data and use so obtained new samples for assessment of the error and compare the obtained means (bootstrap means) with the original one. The differ- ences of these represent the bootstrap “estimation” errors their distribution is viewed as a good representation of the distribution of the true error. In Figure ??, we see the bootstrap counterpart of the distribution of the estimation error. Based on this we can safely say that the nitrate concentration is 49. 99 ± 0. 005.

Exercise 6. Consider a sample of daily number of buyers in a furniture store

8 , 5 , 2 , 3 , 1 , 3 , 9 , 5 , 5 , 2 , 3 , 3 , 8 , 4 , 7 , 11 , 7 , 5 , 12 , 5

Consider the two estimators of θ for a Poisson distribution as discussed in the previous section. Describe formally the procedure (in steps) of obtaining a bootstrap confidence