Docsity
Docsity

Prepara tus exámenes
Prepara tus exámenes

Prepara tus exámenes y mejora tus resultados gracias a la gran cantidad de recursos disponibles en Docsity


Consigue puntos base para descargar
Consigue puntos base para descargar

Gana puntos ayudando a otros estudiantes o consíguelos activando un Plan Premium


Orientación Universidad
Orientación Universidad


Topic 2: INTRODUCTION TO STATISTICAL INFERENCE, Apuntes de Estadística Empresarial

Introduction to statistical inference

Tipo: Apuntes

2021/2022

Subido el 16/05/2026

alex-zorin
alex-zorin 🇪🇸

2 documentos

1 / 16

Toggle sidebar

Esta página no es visible en la vista previa

¡No te pierdas las partes importantes!

bg1
Corporate Statistics II Topic 2
GA.D.E. Prof. Julio Hernández 1
Topic 2: INTRODUCTION TO STATISTICAL
INFERENCE
2.1. KEY ISSUES
A) Objetive: to study one or more variables in a population
B) Method:
1. Sampling selection:
i. Simple random sampling (s.r.s.)
ii. Stratified sampling
iii. Cluster sampling and other techniques
2. We asume the variable following certain probability
distribution
3. The parameters in the population are estimated based on
the sample (parametric statistics). Then we study:
i. Properties of point estimators
ii. Methods for obtaining the point estimator (moments
and maximum-likelihood)
iii. Estimation procedures:
o Point estimation and
o Confidence interval estimation
4. Hypothesis testing is applied
5. Procedures for model criticism (nonparametric statistics):
i. Goodness of fit tests
ii. Tests of Independence (runs test, autocorrelation test)
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Vista previa parcial del texto

¡Descarga Topic 2: INTRODUCTION TO STATISTICAL INFERENCE y más Apuntes en PDF de Estadística Empresarial solo en Docsity!

Topic 2: INTRODUCTION TO STATISTICAL

INFERENCE

2.1. KEY ISSUES

A) Objetive: to study one or more variables in a population B) Method:

  1. Sampling selection: i. Simple random sampling (s.r.s.) ii. Stratified sampling iii. Cluster sampling and other techniques
  2. We asume the variable following certain probability distribution
  3. The parameters in the population are estimated based on the sample (parametric statistics). Then we study: i. Properties of point estimators ii. Methods for obtaining the point estimator (moments and maximum-likelihood) iii. Estimation procedures: o Point estimation and o Confidence interval estimation
  4. Hypothesis testing is applied
  5. Procedures for model criticism (nonparametric statistics): i. Goodness of fit tests ii. Tests of Independence (runs test, autocorrelation test)

iii. Nonparametric tests for paired samples (sign test and Wilcoxon signed rank test) and nonparametric tests for independent random samples (Mann-Whitney U test and Wilcoxon rank sum test). Example: A) Variable to study: the income earned by an individual in Madrid B) Method:

  1. A s.r.s. of n=1000 people must be selected
  2. The variable is assumed to follow a N(μ;σ) distribution.
  3. Parameters μ and σ 2 are estimated: i. Maximum likelihood estimators are obtained: ெ௅ ∗ ெ௅ ଶ∗^ ଶ ii. ML estimators are unbiased, efficient and normally distributed random variables in big samples. iii. Estimation process: a. Point estimation: sample mean and simple variance realized in a given simple. b. Interval estimation: ఊ ଵ ఊ
  • The effectiveness of inference is based on this process
  • Ideal situation: the sample becomes a population on a small scale
  • What if there are deviations from such perfect scenario. The process is under control whenever: i. Those deviations aren’t systematic and ii. Due to randomness SAMPLING PROCEDURES A) Non probabilistic sampling 1 : o The selection system is subjective o Convenience sampling, judgement sampling, snowball sampling. o It may be useful: a. In qualitative, pilot or exploratory studies b. When the researcher does not wish to make inferences over the population B)Probabilistic sampling: o It is based on randomness o Hence, sampling becomes a random phenomenon o Each element selected from the population is an event with a probability of happening o And so is a given sample o It is an objective (scientific) method allowing for the error produced to be measured and controlled. (^1) https://explorable.com/non-probability-sampling

PROBABILISTIC SAMPLING TECHNICS

B.1) Simple Random Sampling (s.r.s.) 2 : Each element in the population has the same probability of being selected. a. This allows for the sample resembling the structure of the population, on average b. It is appropiate when there is homogeneity in the population regarding the variable under study c. It is the easiest sampling method d. Two requirements: all the members of the population must be in a list and a random selection tool has to be implemented e. Two kind of sampling: i. WITH REPLACEMENT:

  1. The same element can be repeated n times
  2. It simplifies the calculus, provided that the elements in the sample are INDEPENDENT
  3. Appropiate in infinite populations ii. WITHOUT REPLACEMENT:
  4. Appropiate in finite populations when the sample size is bigger than 5% of the entire population
  5. In these cases the estimation is more accurate, but calculus are more complicated due to independence is not verified (^2) http://www.statcan.gc.ca/edu/power-pouvoir/ch13/prob/5214899-eng.htm

a. Strata are homogeneus inside them but heterogeneus among them b. After the strata are defined, a sample must be taken from each one of them. Although any sampling technique can be used, it is common to use s.r.s. c. The number of observations to select from each strata can be proportional either to its size or to its variability d. We can draw inferences about specific subgroups in the population e. This technique requires lower sample size than the s.r.s. method B.3) Cluster sampling: The population can be clasified in different clusters, which are groups of heterogeneous elements, hence providing a similar variability to that existing in the population analyzed. a. Then a number of clusters are selected randomly b. After that, one may either survey all the units included in the selected clusters or just a s.r.s. c. It is cheaper than s.r.s. in the whole population B.4) Sistematic sampling: it is a kind of s.r.s. The elements in the population must be listed a. Let k be the integer nearer to:

b. Then, an element of the population is chosen at random among the first k. That one will be the first observation in the sample: x 1 c. The second observation in the sample will be that occupying the x 1 + k position d. The third one will fall at the x 1 + 2k position, and so on till completing the sample: x 1 + (n-1)k Final considerations: o If there is lack of information about the variable under study in the population we will apply s.r.s. o In other case, population is divided in groups either homogeneus (stratum) or heterogeneus (clusters) and then apply s.r.s. o Whenever a sampling procedure is implemented, the technique employed must be clearly explained. o There are no good or bad samples, just good or bad sampling procedures. o The bigger the sample size, the better the estimation. However the improvement in precision decreases from certain sample size o The expenses involved must be taking into account.

SAMPLING DISTRIBUTIONS OF ESTIMATORS (s.r.s.) ONE POPULATION Sample mean: ௜

  1. General case: ξ is any RV with mean μ and variance σ 2 : ଶ If n is low the sampling distribution depends on that one followed by the population
  2. If n is high (n≥30) then CLT applies:
  3. ξ follows a N(μ;σ); then due to the additive property (no matters n):
  4. ξ follows a N(μ;σ) with σ 2 unknown. In such a case, we use the bias-corrected sample standard deviation s 1 instead of σ: ଵ ௡ିଵ or alternatively, using the sample standard deviation s:

௡ିଵ Sample variance ଶ ௜^ ଶ ௜ ଶ ଶ With mean: ଶ ଶ And variance (general case): ଶ ସ^ ସ ସ ସ ଶ ସ ସ ଷ where (^) ௥ ௥ And variance (only when ξ~N(μ;σ)): ଶ ସ ଶ Moreover (key issue) if ξ~N(μ;σ): ଶ ଶ ௡ିଵ ଶ hence: ଶ ଶ ௡ିଵ ଶ

TWO POPULATIONS

A) Two normal populations (independent samples). Two independent samples coming, respectively, from the two variables to be compaired are taken: ଵ ଶ ௜ ௡ ௜ ௫ ௫ ଵ ଶ ௝ ௠ ௝ ௬ ௬ Difference between two sample means Case 1) The population variances are known: ௫ ௬ ௫ ଶ ௬ ଶ Case 2) The population variances are unknown but equal ௫ ௬ ∗ ௡ା௠ିଶ ∗ ଵ௫ ଶ ଵ௬ ଶ

or: ∗ ௫ ଶ ௬ ଶ Case 3) The population variances are unknown and different: ௫ ௬ ଵ௫ ଶ ଵ௬ ଶ ௡ା௠ିଶିு ଵ௫ ଶ ଵ௬ ଶ ଶ ଵ௫ ସ ଶ ଵ௬ ସ ଶ Quotient between sample variances ଵ௫ ଶ ௫ ଶ ଵ௬ ଶ ௬ ଶ (௡ିଵ ),(௠ିଵ ) : ௫ ଶ ௫ ଶ ௬ ଶ ௬ ଶ (௡ିଵ ),(௠ିଵ )

or: ௫ ௬ ௪ ௡ିଵ C) Two non normal populations (independent samples with big size). Difference between two sample means General case: two independent samples with n and m high. The difference between two sample means will be aproximated to a normal distribution with mean μx-μy and variance being equal to the sum of the estimated variances of the respective sample means: ௫ ௬ ଵ௫ ଶ ଵ௬ ଶ Particular case: two independent samples with n and m high coming from a B(1;p 1 ) and a B(1;p 2 ). The difference between two sample means will be aproximated to the following normal distribution: ଵ ଶ ଵ ଶ ଵ ଵ ଶ ଶ Then after standardizing: ଵ ଶ ଵ ଶ ଵ ଵ ଶ ଶ