Prepare-se para as provas
Obter pontos
Guias e Dicas
Venda na Docsity
Docsity I.A.
ENEM

Prepare-se para as provas

Estude fácil! Tem muito documento disponível na Docsity

Ganhe pontos para baixar

Ganhe pontos ajudando outros esrudantes ou compre um plano Premium

Guias e Dicas

Venda na Docsity

Docsity I.A.

Entrar Cadastre-se

Prepare-se para as provas

Estude fácil! Tem muito documento disponível na Docsity

Encontrar documentos

Prepare-se para as provas com trabalhos de outros alunos como você, aqui na Docsity

Encontra documentos específicos para os exames da tua universidade

Videoaulas

Prepare-se com as videoaulas e exercícios resolvidos criados a partir da grade da sua Universidade

Quiz

Responda perguntas de provas passadas e avalie sua preparação.

Docsity AINEW

Resuma seus documentos, faça perguntas, converta-os em questionários e mapas conceituais

TCC e ENEM 2026

Estude com provas passadas, TCCs e dicas úteis

Explorar perguntas

Tire suas dúvidas lendo as respostas dadas por outros alunos como você.

Ganhe pontos para baixar

Ganhe pontos ajudando outros esrudantes ou compre um plano Premium

Compartilhe documentos

20 Pontos

Por cada documento compartilhado

Responda às perguntas

5 Pontos

por cada resposta enviada (máx. 1 por dia)

Todas as maneiras de obter pontos grátis

Ganhe pontos imediatamente

Escolha um Plano Premium com todos os pontos que precisa

Oportunidades de estudo

Escolha seu próximo programa de estudos

Entre em contato direto com as melhores Universidades do mundo. Pesquise entre milhares de Universidades e parceiros oficiais

Comunidade

Pergunte à comunidade

Peça ajuda à comunidade e tire suas dúvidas relacionadas ao estudo

Guias grátis

Os eBooks que salvam estudantes!

Baixe gratuitamente nossos guias de estudo, métodos para diminuir a ansiedade, dicas de TCC preparadas pelos professores da Docsity

Lectures Notes on Statistical Inference, Notas de estudo de Estatística

Universidade Estadual de Ciências da Saúde de Alagoas (UNCISAL)Estatística

Texto de Inferencia Estatistica

Tipologia: Notas de estudo

2020

Compartilhado em 10/01/2020

maria-teresa-afonso-perucho 🇧🇷

(1)

7 documentos

1 / 114

Esta página não é visível na pré-visualização

Não perca as partes importantes!

LECTURE NOTES ON

STATISTICAL INFERENCE

KRZ YS ZTO F POD G ´

ORSKI

Department of Mathematics and Statistics

University of Limerick, Ireland

November 23, 2009

Descubra Notas de estudo de Estatística Universidade Estadual de Ciências da Saúde de Alagoas (UNCISAL)

Documentos relacionados

MANUAL past PORTUGUÊS

(8)

Advanced Probability and Statistics Inference I

Statistical Inference (Casella

(9)

Lista de exercicios de Maxima Verossimilhanca

(1)

Análise Espacial da Mortalidade Feminina por Acidente de Motocicleta no Brasil (2005-2015)

Modelo de histograma

Soluções da Tarefa de Casa #3 - Mecânica Estatística (PHY 540)

Teste estatístico - Como Aprender a escolher

Estatísticas - NOTAS

CEP Controle estatístico de Processo

Folha de Resposta: Avaliação Integrada em Estatística - Cálculo de Amostras e Médias

Segurança em Bancos de Dados: Criptografia de Dados e Controle de Acesso

Pré-visualização parcial do texto

Baixe Lectures Notes on Statistical Inference e outras Notas de estudo em PDF para Estatística, somente na Docsity!

LECTURE NOTES ON

STATISTICAL INFERENCE

KRZYSZTOF PODG ORSKI´

Department of Mathematics and Statistics

University of Limerick, Ireland

November 23, 2009

1 Introduction
- 1.1 Models of Randomness and Statistical Inference
- 1.2 Motivating Example
  - 1.2.1 Probability vs. likelihood
  - 1.2.2 More data
- 1.3 Likelihood and theory of statistics
- 1.4 Computationally intensive methods of statistics - puter generated random samples 1.4.1 Monte Carlo methods – studying statistical methods using com-
  - 1.4.2 Bootstrap – performing statistical inference using computers
2 Review of Probability
- 2.1 Expectation and Variance
- 2.2 Distribution of a Function of a Random Variable
  - ment Generating Functions 2.3 Transforms Method Characteristic, Probability Generating and Mo-
- 2.4 Random Vectors
  - 2.4.1 Sums of Independent Random Variables
  - 2.4.2 Covariance and Correlation
  - 2.4.3 The Bivariate Change of Variables Formula
- 2.5 Discrete Random Variables
  - 2.5.1 Bernoulli Distribution
  - 2.5.2 Binomial Distribution
  - 2.5.3 Negative Binomial and Geometric Distribution
  - 2.5.4 Hypergeometric Distribution
  - 2.5.5 Poisson Distribution
  - 2.5.6 Discrete Uniform Distribution
  - 2.5.7 The Multinomial Distribution
- 2.6 Continuous Random Variables
  - 2.6.1 Uniform Distribution
  - 2.6.2 Exponential Distribution
  - 2.6.3 Gamma Distribution
  - 2.6.4 Gaussian (Normal) Distribution
  - 2.6.5 Weibull Distribution
  - 2.6.6 Beta Distribution
  - 2.6.7 Chi-square Distribution
  - 2.6.8 The Bivariate Normal Distribution
  - 2.6.9 The Multivariate Normal Distribution
- 2.7 Distributions – further properties
  - 2.7.1 Sum of Independent Random Variables – special cases
  - 2.7.2 Common Distributions – Summarizing Tables
3 Likelihood
- 3.1 Maximum Likelihood Estimation
- 3.2 Multi-parameter Estimation
- 3.3 The Invariance Principle
4 Estimation
- 4.1 General properties of estimators
- 4.2 Minimum-Variance Unbiased Estimation
- 4.3 Optimality Properties of the MLE
5 The Theory of Confidence Intervals
- 5.1 Exact Confidence Intervals
- 5.2 Pivotal Quantities for Use with Normal Data
- 5.3 Approximate Confidence Intervals
6 The Theory of Hypothesis Testing
- 6.1 Introduction
- 6.2 Hypothesis Testing for Normal Data
- 6.3 Generally Applicable Test Procedures
- 6.4 The Neyman-Pearson Lemma
- 6.5 Goodness of Fit Tests
- 6.6 The χ^2 Test for Contingency Tables

Chapter 1 Introduction

Everything existing in the universe is the fruit of chance. Democritus, the 5th Century BC

1.1 Models of Randomness and Statistical Inference

Statistics is a discipline that provides with a methodology allowing to make an infer- ence from real random data on parameters of probabilistic models that are believed to generate such data. The position of statistics with relation to real world data and corre- sponding mathematical models of the probability theory is presented in the following diagram. The following is the list of few from plenty phenomena to which randomness is attributed.

Games of chance
- Tossing a coin
- Rolling a die
- Playing Poker
Natural Sciences

Physics (notable Quantum Physics)
Genetics
Climate
Engineering
Risk and safety analysis
Ocean engineering
Economics and Social Sciences
Currency exchange rates
Stock market fluctations
Insurance claims
Polls and election results
etc.

1.2 Motivating Example

Let X denote the number of particles that will be emitted from a radioactive source in the next one minute period. We know that X will turn out to be equal to one of the non-negative integers but, apart from that, we know nothing about which of the possible values are more or less likely to occur. The quantity X is said to be a random variable. Suppose we are told that the random variable X has a Poisson distribution with parameter θ = 2. Then, if x is some non-negative integer, we know that the probability that the random variable X takes the value x is given by the formula

P (X = x) = θ

x (^) exp (−θ) x!

where θ = 2. So, for instance, the probability that X takes the value x = 4 is

P (X = 4) =^2

(^4) exp (−2) 4! = 0.^0902.

We have here a probability model for the random variable X. Note that we are using upper case letters for random variables and lower case letters for the values taken by random variables. We shall persist with this convention throughout the course. Let us still assume that the random variable X has a Poisson distribution with parameter θ but where θ is some unspecified positive number. Then, if x is some non- negative integer, we know that the probability that the random variable X takes the value x is given by the formula

P (X = x|θ) = θ

x (^) exp (−θ) x! ,^ (1.1)

for θ ∈ R+. However, we cannot calculate probabilities such as the probability that X takes the value x = 4 without knowing the value of θ. Suppose that, in order to learn something about the value of θ, we decide to measure the value of X for each of the next 5 one minute time periods. Let us use the notation X 1 to denote the number of particles emitted in the first period, X 2 to denote the number emitted in the second period and so forth. We shall end up with data consisting of a random vector X = (X 1 , X 2 ,... , X 5 ). Consider x = (x 1 , x 2 , x 3 , x 4 , x 5 ) = (2, 1 , 0 , 3 , 4). Then x is a possible value for the random vector X. We know that the probability that X 1 takes the value x 1 = 2 is given by the formula

P (X = 2|θ) = θ

(^2) exp (−θ) 2!

and similarly that the probability that X 2 takes the value x 2 = 1 is given by

P (X = 1|θ) = θ^ exp (1!−θ)

and so on. However, what about the probability that X takes the value x? In order for this probability to be specified we need to know something about the joint distribution of the random variables X 1 , X 2 ,... , X 5. A simple assumption to make is that the ran- dom variables X 1 , X 2 ,... , X 5 are mutually independent. (Note that this assumption may not be correct since X 2 may tend to be more similar to X 1 that it would be to X 5 .) However, with this assumption we can say that the probability that X takes the value x

0 2 4 6 8 10

Number of particles

Probability

Figure 1.2: Probability mass function for Poisson model with θ = 2.

that such a model is correct. However, we have arbitrarily set θ = 2 and this is more questionable. How can we know that it is correct a correct value of the parameter? Let us analyze this issue in detail. If x is some non-negative integer, we know that the probability that the random variable X takes the value x is given by the formula

P (X = x|θ) = θ

xe−θ x! ,

for θ > 0. But without knowing the true value of θ, we cannot calculate probabilities such as the probability that X takes the value x = 1. Suppose that, in order to learn something about the value of θ, an experiment is performed and a value of X = 5 is recorded. Let us take a look at the probability mass function for θ = 2 in Figure 1.2. What is the probability of X to take value 2? Do we like what we see? Why? Would you bet 1 or 2 in the next experiment? We certainly have some serious doubt about our choice of θ = 2 which was arbi- trary anyway. One can consider, for example, θ = 7 as an alternative to θ = 2. Here are graphs of the pmf for the two cases. Which of the two choices do we like? Since it

0.00 0 2 4 6 8 10

0.^ 0.^ 0.^

Number of particles

Probability

0.00 0 2 4 6 8 10

0.^ 0.^

Number of particles

Probability

Figure 1.3: The probability mass function for Poisson model with θ = 2 vs. the one with θ = 7.

was more probable to get X = 5 under the assumption θ = 7 than when θ = 2, we say θ = 7 is more likely to produce X = 5 than θ = 2. Based on this observation we can develop a general strategy for chosing θ. Let us summarize our position. So far we know (or assume) about the radioactive emission that it follows Poisson model with some unknown θ > 0 and the value x = 5 has been once observed. Our goal is somehow to utilized this knowledge. First, we note that the Poisson model is in fact not only a function of x but also of θ

p(x|θ) = θ

xe−θ x!.

Let us plug in the observed x = 5, so that we get a function of θ that is called likelihood function l(θ) = θ

(^5) e−θ

The graph of it is presented on the next figure. Can you localize on this graph the values of probabilities that were used to chose θ = 7 over θ = 2? What value of θ appears to be the most preferable if the same argument is extended to all possible values of θ? We observe that the value of θ = 5 is most likely to produce value x = 5. In the result of our likelihood approach we have used the data x = 5 and the Poisson model to make inference - an example of statistical inference.

Exercise 1. For the general Poisson model

p(x|θ) = l(θ|x) = θ

xe−θ x! ,

for a given θ > find the most probable value of the observation x.
for a given observation x find the most likely value of θ.

Give a mathematical argument for your claims.

1.2.2 More data

Suppose that we perform another measurement of the number of emitted particles. Let us use the notation X 1 to denote the number of particles emitted in the first period, X 2 to denote the number emitted in the second period. We shall end up with data consisting of a random vector X = (X 1 , X 2 ). The second measurement yielded x 2 = 2, so that x = (x 1 , x 2 ) = (5, 2). We know that the probability that X 1 takes the value x 1 = 5 is given by the formula

P (X = 5|θ) = θ

(^5) e−θ 5!

and similarly that the probability that X 2 takes the value x 2 = 2 is given by

P (X = 2|θ) = θ

(^2) e−θ 2!.

However, what about the probability that X takes the value x = (5, 2)? In order for this probability to be specified we need to know something about the joint distribution of the random variables X 1 , X 2. A simple assumption to make is that the random variables X 1 , X 2 are mutually independent. In such a case the probability that X takes the value x = (x 1 , x 2 ) is given by

P (X = (x 1 , x 2 )|θ) = θ

x (^1) e−θ x 1! ·^

θx^2 e−θ x 2! =^ e

− 2 θ θx^1 +x^2 x 1 !x 2!. After little of algebra we easily find the likelihood function of observing X = (5, 2) as l(θ|(5, 2)) = e−^2 θ^ θ

7 240

0 5 10 15

theta

Likelihood

0 5 10 15

theta

Likelihood

Figure 1.5: Likelihood of observing (5, 2) (top) vs. the one of observing 5 (bottom).

and its graph is presented in Figure 1.5 in comparison with the previous likelihood for a single observation. Two important effects of adding an extra information should be noted

We observe that the location of the maximum shifted from 5 to 3 compared to single observation.
We also note that the range of likely values for θ has diminished. Let us suppose that eventually we decide to measure three more values of X. Let us use the vector notation X = (X 1 , X 2 ,... , X 5 ) to denote observable random

1.3 Likelihood and theory of statistics

The strategy of making statistical inference based on the likelihood function as de- scribed above is the recurrent theme in mathematical statistics and thus in our lecture. Using mathematical argument we would compare various strategies to infering about the parameters and often we will demonstrate that the likelihood based methods are optimal. It will show its strength also as a criterium deciding between various claims about parameters of the model which is the leading story of testing hypotheses. In the modern days, the role of computers has increased in statistical methodology. New computationally intense methods of data explorations become one of the central areas of modern statistcs. Even there, methods that refer to likelihood play dominant roles, in particular, in Bayesian methodology. Despite this extensive penetration of statistical methodology by likelihood techin- ques, by no means statistics can be reduced to analysis of likelihood. In every area of statistics, there are important aspects that require reaching beyond likelihood, in many cases, likelihood is not even a focus of studies and development. The purpose of this course is to present both the importance of likelihood approach across statistics but also presentation of topics for which likelihood plays a secondary role if any.

1.4 Computationally intensive methods of statistics

The second part of our presentation of modern statistical inference is devoted to compu- tationally intensive statistical methods. The area of data explorations is rapidly growing in importance due to

common access to inexpensive but advance computing tools,
emerging of new challenges associated with massive highly dimensional data far exceeding traditional assumptions on which traditional methods of statistics have been based.

In this introduction we give two examples that illustrate the power of modern computers and computing software both in analysis of statistical models and in performing actual

statistical inference. We start with analyzing a performance of a statistical procedure using random sample generation.

1.4.1 Monte Carlo methods – studying statistical methods using

computer generated random samples

Randomness can be used to study properties of a mathematical model. The model itself may be probabilistic or not but here we focus on the probabilistic ones. Essentially, it is based on repetitive simulations of random samples corresponding to the model and observing behavior of objects of interests. An example of Monte Carlo method is ap- proximate the area of circle by tossing randomly a point (typically computer generated) on the paper where a circle is drawn. The percentage of points that fall inside the circle represents (approximately) percentage of the area covered by the circle, as illustrated in Figure 1.6.

Exercise 4. Write an R code that would explore the area of an elipsoid using Monte Carlo method.

Below we present an application of Monte Carlo approach to studying fitting meth- ods for the Poisson model.

Deciding for Poisson model

Recall that the Poisson model is given by

P (X = x|θ) = θ

xe−θ x!.

It is relatively easy to demonstrate that the mean value of this distribution is equal to θ and standard deviation is also equal to θ.

Exercise 5. Present a formal argument showing that for a Poisson random variable X with parameter θ, EX = θ and VarX = θ.

Thus for a sample of observations x = (x 1 ,... , xn) it is reasonable to consider

Histogram of means

means

Frequency 2.5 3.0 3.5 4.0 4.5 5.0 5.

100

Histogram of vars

vars

Frequency 0 5 10 15

150

300

Figure 1.7: Monte Carlo results of comparing estimation of θ = 4 by the sample mean (left) vs. estimation using the sample standard deviation right.

estimates performs better. The resulting histograms of the values of estimator are pre- sented in Figure 1.8. It is quite clear from the graphs that the estimator based on the mean is better than the one based on the variance.

1.4.2 Bootstrap – performing statistical inference using computers

Bootstrap (resampling) methods are one of the examples of Monte Carlo based statis- tical analysis. The methodology can be summarized as follows

Collect statistical sample, i.e. the same type of data as in classical statistics.
Used a properly chosen Monte Carlo based resampling from the data using RNG
- create so called bootstrap samples.
Analyze bootstrap samples to draw conclusions about the random mechanism

that produced the original statistical data.

This way randomness is used to analyze statistical samples that, by the way, are also a result of randomness. An example illustrating the approach is presented next.

Estimating nitrate ion concentration

Nitrate ion concentration measurements in a certain chemical lab has been collected and their results are given in the following table. The goal is to estimate, based on

0.51 0.51 0.51 0.50 0.51 0.49 0.52 0.53 0.50 0. 0.51 0.52 0.53 0.48 0.49 0.50 0.52 0.49 0.49 0. 0.49 0.48 0.46 0.49 0.49 0.48 0.49 0.49 0.51 0. 0.51 0.51 0.51 0.48 0.50 0.47 0.50 0.51 0.49 0. 0.51 0.50 0.50 0.53 0.52 0.52 0.50 0.50 0.51 0.

Table 1.1: Results of 50 determinations of nitrate ion concentration in μg per ml.

these values, the actual nitrate ion concentration. The overall mean of all observations is 0.4998. It is natural to ask what is the error of this determination of the nitrate concentration. If we would repeat our experiment of collecting 50 samples of nitrate concentrations many times we would see the range of error that is made. However, it would be a waste of resources and not a viable method at all. Instead we resample ‘new’ data from our data and use so obtained new samples for assessment of the error and compare the obtained means (bootstrap means) with the original one. The differ- ences of these represent the bootstrap “estimation” errors their distribution is viewed as a good representation of the distribution of the true error. In Figure ??, we see the bootstrap counterpart of the distribution of the estimation error. Based on this we can safely say that the nitrate concentration is 49. 99 ± 0. 005.

Exercise 6. Consider a sample of daily number of buyers in a furniture store

8 , 5 , 2 , 3 , 1 , 3 , 9 , 5 , 5 , 2 , 3 , 3 , 8 , 4 , 7 , 11 , 7 , 5 , 12 , 5

Consider the two estimators of θ for a Poisson distribution as discussed in the previous section. Describe formally the procedure (in steps) of obtaining a bootstrap confidence

Lectures Notes on Statistical Inference, Notas de estudo de Estatística

Documentos relacionados

Pré-visualização parcial do texto

Baixe Lectures Notes on Statistical Inference e outras Notas de estudo em PDF para Estatística, somente na Docsity!

LECTURE NOTES ON

STATISTICAL INFERENCE

KRZYSZTOF PODG ORSKI´

Department of Mathematics and Statistics

University of Limerick, Ireland

November 23, 2009

Contents

Chapter 1

Introduction

1.1 Models of Randomness and Statistical Inference

1.2.2 More data

1.4.1 Monte Carlo methods – studying statistical methods using

computer generated random samples

1.4.2 Bootstrap – performing statistical inference using computers