Applied Actuarial Statistics: Sample Final Examination - MATH 238 - Prof. Emiliano Valdez, Exams of Mathematics

A sample final examination for the math 238 - applied actuarial statistics course. The examination covers topics such as regression analysis, population density, and sales data. Summary statistics, regression models, and graphs.

Typology: Exams

Pre 2010

Uploaded on 02/25/2010

koofers-user-tqp
koofers-user-tqp 🇺🇸

10 documents

1 / 11

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
MATH 238 - Applied Actuarial Statistics
Sample Final Examination - Time Allowed: 2 hours
Question no. 1: [25 points]
In a study of population density around a large urban area, a random sample of 10 residential areas was
selected. For each area, the distance (in km) from the city center, called distance, and the population
density, called popdens, were recorded. Some summary statistics of the data are provided below:
> summary(density)
popdens distance
Min. : 5.0 Min. : 0.400
1st Qu.: 17.0 1st Qu.: 3.450
Median : 43.0 Median : 5.600
Mean : 60.0 Mean : 5.710
3rd Qu.: 94.5 3rd Qu.: 7.975
Max. :149.0 Max. :11.700
> sd(popdens)
[1] 53.96913
> sd(distance)
[1] 3.664075
The following figures provide relationships of popdens and distance, together with possible combinations
of their log-transformations.
0 2 4 6 8 10 12
0 20 40 60 80 100 120 140
popdens vs distance
distance
popdens
−1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5
0 20 40 60 80 100 120 140
popdens vs log(distance)
log(distance)
popdens
0 2 4 6 8 10 12
1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
log(popdens) vs distance
distance
log(popdens)
−1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5
1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
log(popdens) vs log(distance)
log(distance)
log(popdens)
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Applied Actuarial Statistics: Sample Final Examination - MATH 238 - Prof. Emiliano Valdez and more Exams Mathematics in PDF only on Docsity!

MATH 238 - Applied Actuarial Statistics Sample Final Examination - Time Allowed: 2 hours

Question no. 1: [25 points]

In a study of population density around a large urban area, a random sample of 10 residential areas was selected. For each area, the distance (in km) from the city center, called distance, and the population density, called popdens, were recorded. Some summary statistics of the data are provided below:

summary(density) popdens distance Min. : 5.0 Min. : 0. 1st Qu.: 17.0 1st Qu.: 3. Median : 43.0 Median : 5. Mean : 60.0 Mean : 5. 3rd Qu.: 94.5 3rd Qu.: 7. Max. :149.0 Max. :11.

sd(popdens) [1] 53. sd(distance) [1] 3.

The following figures provide relationships of popdens and distance, together with possible combinations of their log-transformations.

l l

l

l

l

l l l l (^) l 0 2 4 6 8 10 12

0

20

40

60

80

100

120

140

popdens vs distance

distance

popdens

l l

l

l

l

l ll l (^) l −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.

0

20

40

60

80

100

120

140

popdens vs log(distance)

log(distance)

popdens

l (^) l l

l

l l l l l l

0 2 4 6 8 10 12

1.^ 2.^ 2.^ 3.^ 3.^ 4.^ 4.^

log(popdens) vs distance

distance

log(popdens)

l (^) l l

l

l l l l l l

−1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.

1.^ 2.^ 2.^ 3.^ 3.^ 4.^ 4.^

log(popdens) vs log(distance)

log(distance)

log(popdens)

You are considering three regression models (the estimated models immediately follow):

Model 1: popdens = β 0 + βdistancedistance + ε.

Model 2: popdens = β 0 + βlog(distance)log(distance) + ε.

Model 3: log(popdens) = β 0 + βdistancedistance + ε.

lm1 <- lm(popdens~distance) lm2 <- lm(popdens~log(distance)) lm3 <- lm(log(popdens)~distance) summary(lm1)

Call: lm(formula = popdens ~ distance)

Residuals: Min 1Q Median 3Q Max -30.8889 -9.9273 0.2970 12.5559 28.

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 139.699 11.119 12.564 1.51e-06 *** distance -13.958 1.663 -8.392 3.09e-05 ***


Signif. codes: 0 ’’ 0.001 ’’ 0.01 ’’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 18.28 on 8 degrees of freedom Multiple R-Squared: 0.898, Adjusted R-squared: 0. F-statistic: 70.42 on 1 and 8 DF, p-value: 3.091e-

summary(lm2)

Call: lm(formula = popdens ~ log(distance))

Residuals: Min 1Q Median 3Q Max -21.954 -10.437 -6.297 11.212 29.

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 126.990 9.147 13.883 7.01e-07 *** log(distance) -47.980 5.293 -9.065 1.76e-05 ***


Signif. codes: 0 ’’ 0.001 ’’ 0.01 ’’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 17.05 on 8 degrees of freedom Multiple R-Squared: 0.9113, Adjusted R-squared: 0.

xtabs(~region+ptype) ptype region A B east 3 3 north 3 3 south 3 3 west 3 3 xtabs(sales~region+ptype)/xtabs(~region+ptype) ptype region A B east 9.165 9. north 8.580 6. south 9.035 7. west 9.360 10.

(a) First consider the effect of region. Comment on what you observe from the boxplot below:

l

l

l

l

east north south west

6

7

8

9

10

11

region

sales

(b) Denote by salesij the amount of sales by agent j in region i and consider the model

salesij = μ + τ (^) i + εij , for i = east, north,south,west, j = 1, 2 , 3.

Perform the test H 0 : τ (^) east = τ (^) north = τ (^) south = τ (^) wast = 0 and give your conclusions.

lm1 <- lm(sales~region) anova(lm1) Analysis of Variance Table

Response: sales Df Sum Sq Mean Sq F value Pr(>F) region 3 25.4435 8.4812 9.4609 0.0004257 *** Residuals 20 17.9288 0.

Signif. codes: 0 ’’ 0.001 ’’ 0.01 ’’ 0.05 ’.’ 0.1 ’ ’ 1

summary(lm1)

Call: lm(formula = sales ~ region)

Residuals: Min 1Q Median 3Q Max -1.36500 -0.83688 -0.08125 0.84500 1.

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 9.4900 0.3865 24.552 < 2e-16 *** regionnorth -2.0800 0.5466 -3.805 0.00111 ** regionsouth -1.3650 0.5466 -2.497 0.02137 * regionwest 0.4875 0.5466 0.892 0.


Signif. codes: 0 ’’ 0.001 ’’ 0.01 ’’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 0.9468 on 20 degrees of freedom Multiple R-Squared: 0.5866, Adjusted R-squared: 0. F-statistic: 9.461 on 3 and 20 DF, p-value: 0.

(c) Calculate a 95% confidence interval for the difference in sales between regions east and west, i.e. τ (^) east −τ (^) west. Give a verbal interpretation of this interval. Use Tukey’s studentized range distribution.

(d) Now consider the effects of both region and policy type with the model expressed as

salesijk = μ + τ (^) i + βj + εijk, for i = east,north,south,west, j = A,B, k = 1, 2 , 3.

Perform the F test and give your conclusions.

lm2 <- lm(sales~region+ptype) anova(lm2) Analysis of Variance Table

Response: sales Df Sum Sq Mean Sq F value Pr(>F) region 3 25.4435 8.4812 10.0790 0.0003435 *** ptype 1 1.9409 1.9409 2.3065 0. Residuals 19 15.9879 0.


Signif. codes: 0 ’’ 0.001 ’’ 0.01 ’’ 0.05 ’.’ 0.1 ’ ’ 1

summary(lm2)

Call: lm(formula = sales ~ region + ptype)

Question no. 4: [20 points]

The semi-annual profits of an insurance company (in $millions) are depicted in the figure below for the past 10 years. These profit values have been adjusted to present them at today’s values.

profits.ts Time Series: Start = c(1996, 1) End = c(2005, 2) Frequency = 2 [1] 695 521 782 608 826 695 869 739 913 739 913 782 956 782 956 [16] 782 1000 826 1043 869

l

l

l

l

l

l

l

l

l

l

l

l

l

l

l

l

l

l

l

l

Insurance Semi−annual Profits for 1996−

Time

profits (in millions)

1996 1998 2000 2002 2004 2006

600

700

800

900

1000

(a) Describe the three types of component into which a time series can be decomposed. Define an additive model for a time series with these components, and explain its multiplicative model variation together with how you might be able to convert to an additive model.

(b) Describe the main features of the profits data depicted in the figure above. Discuss stationarity and say whether or not the data is considered a stationary process.

(c) The decomposition done in R is displayed in the figure below and the output. Explain the results of this decomposition.

plot(decompose(profits.ts)) decompose(profits.ts) $seasonal Time Series: Start = c(1996, 1) End = c(2005, 2) Frequency = 2 [1] 87.71875 -87.71875 87.71875 -87.71875 87.71875 -87.71875 87. [8] -87.71875 87.71875 -87.71875 87.71875 -87.71875 87.71875 -87. [15] 87.71875 -87.71875 87.71875 -87.71875 87.71875 -87.

$trend Time Series: Start = c(1996, 1) End = c(2005, 2) Frequency = 2 [1] NA 629.75 673.25 706.00 738.75 771.25 793.00 815.00 826.00 826. [11] 836.75 858.25 869.00 869.00 869.00 880.00 902.00 923.75 945.25 NA

$random Time Series: Start = c(1996, 1) End = c(2005, 2) Frequency = 2 [1] NA -21.03125 21.03125 -10.28125 -0.46875 11.46875 -11. [8] 11.71875 -0.71875 0.71875 -11.46875 11.46875 -0.71875 0. [15] -0.71875 -10.28125 10.28125 -10.03125 10.03125 NA

$figure [1] 87.71875 -87.

$type [1] "additive"

attr(,"class") [1] "decomposed.ts"

600

800

1000

observed

650

750

850

950

trend

0

50

seasonal

0

10

20

1996 1998 2000 2002 2004

random

Time

Decomposition of additive time series

l l l

l l

l

l

l l

l

l

l

l (^) ll

l

l

l

l l

l

l l

2000 4000 6000 8000 10000

1000

2000

3000

4000

5000

6000

miles

claims

B

A

(a) There are 3 types of observations considered unusual: leverage points, outliers, and influential. Explain each of these types and say which type are the points labeled A and B in the figure.

(b) A regression analysis using the full data set was performed with claims as the response and miles as the predictor. The output below is produced from R. Explain why the slope coefficient is negative.

lm1 <- lm(claims~miles) summary(lm1)

Call: lm(formula = claims ~ miles)

Residuals: Min 1Q Median 3Q Max -1502.3 -770.6 192.1 507.3 1821.

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4756.8315 970.6979 4.900 7.59e-05 *** miles -0.3620 0.1188 -3.047 0.00613 **


Signif. codes: 0 ’’ 0.001 ’’ 0.01 ’’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 946.6 on 21 degrees of freedom Multiple R-Squared: 0.3065, Adjusted R-squared: 0. F-statistic: 9.282 on 1 and 21 DF, p-value: 0.

(c) After removing observation points A and B, a further regression analysis was performed and output is produced below. Observe the marked change in the results. Explain why the removal of the two points causes these dramatic changes.

lm2 <- lm(claims~miles,data=ins.claims[-c(6,10),]) summary(lm2)

Call: lm(formula = claims ~ miles, data = ins.claims[-c(6, 10), ])

Residuals: Min 1Q Median 3Q Max -604.30 -315.05 32.59 209.36 1075.

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2945.2307 926.0968 -3.180 0.00493 ** miles 0.5355 0.1091 4.909 9.74e-05 ***


Signif. codes: 0 ’’ 0.001 ’’ 0.01 ’’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 412.7 on 19 degrees of freedom Multiple R-Squared: 0.5592, Adjusted R-squared: 0. F-statistic: 24.1 on 1 and 19 DF, p-value: 9.735e-

(d) How would you advise your client who wanted to use either of the two regression equations as a summary of the relation between claims and miles traveled?