Docsity
Docsity

Prepara tus exámenes
Prepara tus exámenes

Prepara tus exámenes y mejora tus resultados gracias a la gran cantidad de recursos disponibles en Docsity


Consigue puntos base para descargar
Consigue puntos base para descargar

Gana puntos ayudando a otros estudiantes o consíguelos activando un Plan Premium


Orientación Universidad
Orientación Universidad


tema 5 estadística, Apuntes de Estadística

Asignatura: Estadística II, Profesor: jorge herrera, Carrera: Derecho + Administración y Dirección de Empresas, Universidad: UC3M

Tipo: Apuntes

2016/2017

Subido el 12/11/2017

johncarvajal
johncarvajal 🇪🇸

5

(1)

7 documentos

1 / 5

Toggle sidebar

Esta página no es visible en la vista previa

¡No te pierdas las partes importantes!

bg1
Statistics II
Exercises Chapter 5
1. Consider the four datasets provided in the transparencies for Chapter 5 (section 5.1)
(a) Check that all four datasets generate exactly the same LS linear regression equation.
(b) Apply to dataset # 1 the methods for detecting the presence of specification error discussed
in class and comment the results.
(c) Apply to dataset # 2 the methods for detecting the presence of specification error discussed
in class and comment the results.
(d) Apply to dataset # 3 the methods for detecting the presence of specification error discussed
in class and comment the results.
(e) Find the outlier in dataset # 3. Obtain the LS regression line after eliminating this data point
and comment on the results.
2. Using a sample of 30 observations, the following linear regression model was estimated ˆyi=ˆ
β0+ˆ
β1xi,
with ˆ
β0= 10.1 and ˆ
β1= 8.4. The sum of squared deviations from the mean for the response variable,
due to the model, is Piyi¯y)2= 128, while the sum of squared residuals is Pie2
i= 286.
(a) Compute the Coefficient of Determination and interpret it.
(b) What can you say regarding the correlation coefficient between the values xiand yi?
(c) Build the corresponding ANOVA table using these data.
(d) Test, at the 5% significance level, the hypothesis that the response variable ydoes not depend
on x. Repeat this test at the 1% significance level.
(e) Provide an unbiased estimator for the variance of the error term.
3. The manager of a car dealership is interested in finding the statistical relationship between the
number of salesmen employed during weekends and the corresponding total number of cars sold.
The following data were obtained over the course of 6 consecutive weekends:
xi(# of salesmen) yi(# of cars sold)
1 5 22
2 7 20
3 4 15
4 2 9
5 4 17
6 8 25
(a) Find the LS regression line for y(# of cars sold) as a function of x(# of salesmen).
(b) Build the corresponding ANOVA table and check the validity of the decomposition TSS =
ESS + RSS.
(c) Compute the coefficient of determination and interpret it.
(d) Use the ANOVA table to test, at the 1% and 5% significance levels, the hypothesis that the
number of salesmen employed during the weekend does not affect the corresponding total
number of cars sold.
(e) Test the same hypotheses as in the preceding question, but using the procedure studied in
Chapter 4. Check that the corresponding test statistic Tand the test statistic Fyou used
before satisfy the equality F=T2.
4. Using the data transformations seen in class, linearize the following nonlinear relationships:
(a) y= ln(5x).
1
pf3
pf4
pf5

Vista previa parcial del texto

¡Descarga tema 5 estadística y más Apuntes en PDF de Estadística solo en Docsity!

Statistics II

Exercises Chapter 5

  1. Consider the four datasets provided in the transparencies for Chapter 5 (section 5.1)

(a) Check that all four datasets generate exactly the same LS linear regression equation. (b) Apply to dataset # 1 the methods for detecting the presence of specification error discussed in class and comment the results. (c) Apply to dataset # 2 the methods for detecting the presence of specification error discussed in class and comment the results. (d) Apply to dataset # 3 the methods for detecting the presence of specification error discussed in class and comment the results. (e) Find the outlier in dataset # 3. Obtain the LS regression line after eliminating this data point and comment on the results.

  1. Using a sample of 30 observations, the following linear regression model was estimated ˆyi = βˆ 0 + βˆ 1 xi, with βˆ 0 = 10.1 and βˆ 1 = 8.4. The sum of squared deviations from the mean for the response variable, due to the model, is

i(ˆyi^ −^ y¯) (^2) = 128, while the sum of squared residuals is ∑ i e 2 i = 286. (a) Compute the Coefficient of Determination and interpret it. (b) What can you say regarding the correlation coefficient between the values xi and yi? (c) Build the corresponding ANOVA table using these data. (d) Test, at the 5% significance level, the hypothesis that the response variable y does not depend on x. Repeat this test at the 1% significance level. (e) Provide an unbiased estimator for the variance of the error term.

  1. The manager of a car dealership is interested in finding the statistical relationship between the number of salesmen employed during weekends and the corresponding total number of cars sold. The following data were obtained over the course of 6 consecutive weekends:

xi (# of salesmen) yi (# of cars sold) 1 5 22 2 7 20 3 4 15 4 2 9 5 4 17 6 8 25

(a) Find the LS regression line for y (# of cars sold) as a function of x (# of salesmen). (b) Build the corresponding ANOVA table and check the validity of the decomposition TSS = ESS + RSS. (c) Compute the coefficient of determination and interpret it. (d) Use the ANOVA table to test, at the 1% and 5% significance levels, the hypothesis that the number of salesmen employed during the weekend does not affect the corresponding total number of cars sold. (e) Test the same hypotheses as in the preceding question, but using the procedure studied in Chapter 4. Check that the corresponding test statistic T and the test statistic F you used before satisfy the equality F = T 2.

  1. Using the data transformations seen in class, linearize the following nonlinear relationships:

(a) y = ln(

x).

(b) y = 23 8 x. (c) y = 1/(4 − x). (d) y = (^54)

x.

  1. Assume we have obtained the following measurements for a response variable y as a function of the explanatory variable x:

xi yi 1 5. 2 7. 3 9. 4 10. 5 11. 6 12.

(a) Draw the scatterplot (xi, yi). Do you think a linear relationship can adequately describe this dataset? (b) Assuming an adequate model is of the form y = axb^ u, apply the correct transformations to the variables x and y, and obtain point estimates of the parameters a and b from an LS linear regression model between the transformed variables. (c) Build the ANOVA table for the transformed variables. Obtain and interpret the corresponding coefficient of determination.

  1. For dataset # 1 from Exercise 1, obtain the LS estimators for the linear regression coefficients using vector-matrix notation.
  2. Using n = 34 observations the following multiple linear regression model was estimated: ˆy = 2.50 + 6.8x 1 + 6.9x 2 − 7.2x 3. The standard errors of the regression coefficient estimates for the explanatory variables are as follows s( βˆ 1 ) = 3.1, s( βˆ 2 ) = 3.7 and s( βˆ 3 ) = 3.2. The corresponding coefficient of determination is R^2 = 0.85.

(a) Obtain confidence intervals at the 95% level for the regression coefficients corresponding to each explanatory variable. (b) For each explanatory variable, test at the 5% significance level the hypothesis that the response does not depend on that variable. (c) Is there evidence, at the 1% significance level, that for each explanatory variable the corre- sponding regression coefficient is positive?

  1. Assume you have obtained estimates of the regression coefficients for the following model yi = β 0 + β 1 x 1 + · · · + βkxk + ui. Test, at the 5% significance level, the null hypotheses that the response does not depend on any of the explanatory variables, using the following partial ANOVA tables:

(a)

Source of variation SS D.F. Mean F ratio Model 4500 3 Residuals 500 26 Total

(b)

Source of variation SS D.F. Mean F ratio Model 9780 6 Residuals 2100 32 Total

(c)

Source of variation SS D.F. Mean F ratio Model 46000 8 Residuals 25000 27 Total

  1. For 10 single-family houses we obtained their price (in Me), the size of their built area (in m^2 ), the size of the surrounding terrain (in Has.), and the number of bathrooms:

(d) For a simple linear regression model yi = b 0 + b 1 xi + ui, where we assume that the usual hypotheses hold, assume also that we are given a simple random sample of size n, composed of pairs (y 1 , x 1 ), · · · , (yn, xn). We can write down the variance of ˆb 1 in two different ways: the first one would be Var(ˆb 1 ) = σ 2 nS^2 x^ , where^ S

2 x = (1/n)^

∑n i=1(xi^ −^ ¯x) (^2) , and the second one would be based on the matrix representation of the model, given by corresponding element of the matrix σ^2 (X′X)−^1. Prove that both representations are equivalent.

  1. The Tourism Department of the Madrid regional government wishes to analyze the relationship between the yearly tourism promotion budgets and education and infrastructures budgets, for towns in the region having less than 10000 inhabitants. 20 towns were randomly selected for this study, and the following variables were measured:
    • y = yearly tourism promotion budget (in millions of euros).
    • x 1 = yearly education budget (in millions of euros).
    • x 2 = yearly infrastructures budget (in millions of euros).

From this study the following information was obtained:

(XT^ X)−^1 =

 XT^ Y =

We also know that the residual sum of squares is

RSS =

∑^20

i=

(yi − yˆi)^2 = 0.

and the total sum of squares is

TSS =

∑^20

i=

(yi − y¯)^2 = 0.

For the multiple linear regression model:

yi = β 0 + β 1 xi 1 + β 2 xi 2 + ui i = 1... , 20 ,

you are asked to:

(a) Estimate and interpret the coefficients of the linear regression model. (b) Build the ANOVA table for this regression and conduct a global significance test. (c) Conduct significance tests for each of the parameters in the model, indicating in each case the null and alternative hypotheses. What can you comment on the significance of the parameters, when compared to the results for the preceding question? (d) Compute a forecast for the yearly tourism promotion budget for a town with a yearly education budget of 1.3 (millions of euros) and an infrastructure budget of 1.2 (millions of euros).

  1. You are given a sample of 20 observations {x, z, y} of values from three variables, X, Y and Z. For this sample you have that ∑^20

i=

y i^2 = 10.08, ¯y = 0.

You have computed the LS estimates for the regression model y = β 0 + β 1 x + β 2 z + u. The values you have obtained are:

βˆ 0 = 0.065, βˆ 1 = -0.358, βˆ 2 = 0.104, s( βˆ 1 ) = 0.152, s( βˆ 2 ) = 0.028,

∑^20

i=

e^2 i = 2.

If we accept that the assumptions for the linear regression model hold, answer the following ques- tions:

(a) Complete the ANOVA table for this multiple linear regression model.

(b) Compute the coefficient of multiple determination for the model and comment on its value.

(c) Test if the model is globally significant to explain the values of Y as a linear function of X and Z, at a 1% significance level.

(d) Test if you would have sufficient evidence to conclude that increases in the value of the vari- able X imply decreases in the value of the variable Y (while Z remains constant), at a 5% significance level.