






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A sample final examination for the math 238 - applied actuarial statistics course. The examination covers topics such as regression analysis, population density, and sales data. Summary statistics, regression models, and graphs.
Typology: Exams
1 / 11
This page cannot be seen from the preview
Don't miss anything!







MATH 238 - Applied Actuarial Statistics Sample Final Examination - Time Allowed: 2 hours
Question no. 1: [25 points]
In a study of population density around a large urban area, a random sample of 10 residential areas was selected. For each area, the distance (in km) from the city center, called distance, and the population density, called popdens, were recorded. Some summary statistics of the data are provided below:
summary(density) popdens distance Min. : 5.0 Min. : 0. 1st Qu.: 17.0 1st Qu.: 3. Median : 43.0 Median : 5. Mean : 60.0 Mean : 5. 3rd Qu.: 94.5 3rd Qu.: 7. Max. :149.0 Max. :11.
sd(popdens) [1] 53. sd(distance) [1] 3.
The following figures provide relationships of popdens and distance, together with possible combinations of their log-transformations.
l l
l
l
l
l l l l (^) l 0 2 4 6 8 10 12
0
20
40
60
80
100
120
140
popdens vs distance
distance
popdens
l l
l
l
l
l ll l (^) l −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.
0
20
40
60
80
100
120
140
popdens vs log(distance)
log(distance)
popdens
l (^) l l
l
l l l l l l
0 2 4 6 8 10 12
1.^ 2.^ 2.^ 3.^ 3.^ 4.^ 4.^
log(popdens) vs distance
distance
log(popdens)
l (^) l l
l
l l l l l l
−1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.
1.^ 2.^ 2.^ 3.^ 3.^ 4.^ 4.^
log(popdens) vs log(distance)
log(distance)
log(popdens)
You are considering three regression models (the estimated models immediately follow):
Model 1: popdens = β 0 + βdistancedistance + ε.
Model 2: popdens = β 0 + βlog(distance)log(distance) + ε.
Model 3: log(popdens) = β 0 + βdistancedistance + ε.
lm1 <- lm(popdens~distance) lm2 <- lm(popdens~log(distance)) lm3 <- lm(log(popdens)~distance) summary(lm1)
Call: lm(formula = popdens ~ distance)
Residuals: Min 1Q Median 3Q Max -30.8889 -9.9273 0.2970 12.5559 28.
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 139.699 11.119 12.564 1.51e-06 *** distance -13.958 1.663 -8.392 3.09e-05 ***
Signif. codes: 0 ’’ 0.001 ’’ 0.01 ’’ 0.05 ’.’ 0.1 ’ ’ 1
Residual standard error: 18.28 on 8 degrees of freedom Multiple R-Squared: 0.898, Adjusted R-squared: 0. F-statistic: 70.42 on 1 and 8 DF, p-value: 3.091e-
summary(lm2)
Call: lm(formula = popdens ~ log(distance))
Residuals: Min 1Q Median 3Q Max -21.954 -10.437 -6.297 11.212 29.
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 126.990 9.147 13.883 7.01e-07 *** log(distance) -47.980 5.293 -9.065 1.76e-05 ***
Signif. codes: 0 ’’ 0.001 ’’ 0.01 ’’ 0.05 ’.’ 0.1 ’ ’ 1
Residual standard error: 17.05 on 8 degrees of freedom Multiple R-Squared: 0.9113, Adjusted R-squared: 0.
xtabs(~region+ptype) ptype region A B east 3 3 north 3 3 south 3 3 west 3 3 xtabs(sales~region+ptype)/xtabs(~region+ptype) ptype region A B east 9.165 9. north 8.580 6. south 9.035 7. west 9.360 10.
(a) First consider the effect of region. Comment on what you observe from the boxplot below:
l
l
l
l
east north south west
6
7
8
9
10
11
region
sales
(b) Denote by salesij the amount of sales by agent j in region i and consider the model
salesij = μ + τ (^) i + εij , for i = east, north,south,west, j = 1, 2 , 3.
Perform the test H 0 : τ (^) east = τ (^) north = τ (^) south = τ (^) wast = 0 and give your conclusions.
lm1 <- lm(sales~region) anova(lm1) Analysis of Variance Table
Response: sales Df Sum Sq Mean Sq F value Pr(>F) region 3 25.4435 8.4812 9.4609 0.0004257 *** Residuals 20 17.9288 0.
Signif. codes: 0 ’’ 0.001 ’’ 0.01 ’’ 0.05 ’.’ 0.1 ’ ’ 1
summary(lm1)
Call: lm(formula = sales ~ region)
Residuals: Min 1Q Median 3Q Max -1.36500 -0.83688 -0.08125 0.84500 1.
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 9.4900 0.3865 24.552 < 2e-16 *** regionnorth -2.0800 0.5466 -3.805 0.00111 ** regionsouth -1.3650 0.5466 -2.497 0.02137 * regionwest 0.4875 0.5466 0.892 0.
Signif. codes: 0 ’’ 0.001 ’’ 0.01 ’’ 0.05 ’.’ 0.1 ’ ’ 1
Residual standard error: 0.9468 on 20 degrees of freedom Multiple R-Squared: 0.5866, Adjusted R-squared: 0. F-statistic: 9.461 on 3 and 20 DF, p-value: 0.
(c) Calculate a 95% confidence interval for the difference in sales between regions east and west, i.e. τ (^) east −τ (^) west. Give a verbal interpretation of this interval. Use Tukey’s studentized range distribution.
(d) Now consider the effects of both region and policy type with the model expressed as
salesijk = μ + τ (^) i + βj + εijk, for i = east,north,south,west, j = A,B, k = 1, 2 , 3.
Perform the F test and give your conclusions.
lm2 <- lm(sales~region+ptype) anova(lm2) Analysis of Variance Table
Response: sales Df Sum Sq Mean Sq F value Pr(>F) region 3 25.4435 8.4812 10.0790 0.0003435 *** ptype 1 1.9409 1.9409 2.3065 0. Residuals 19 15.9879 0.
Signif. codes: 0 ’’ 0.001 ’’ 0.01 ’’ 0.05 ’.’ 0.1 ’ ’ 1
summary(lm2)
Call: lm(formula = sales ~ region + ptype)
Question no. 4: [20 points]
The semi-annual profits of an insurance company (in $millions) are depicted in the figure below for the past 10 years. These profit values have been adjusted to present them at today’s values.
profits.ts Time Series: Start = c(1996, 1) End = c(2005, 2) Frequency = 2 [1] 695 521 782 608 826 695 869 739 913 739 913 782 956 782 956 [16] 782 1000 826 1043 869
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
Insurance Semi−annual Profits for 1996−
Time
profits (in millions)
1996 1998 2000 2002 2004 2006
600
700
800
900
1000
(a) Describe the three types of component into which a time series can be decomposed. Define an additive model for a time series with these components, and explain its multiplicative model variation together with how you might be able to convert to an additive model.
(b) Describe the main features of the profits data depicted in the figure above. Discuss stationarity and say whether or not the data is considered a stationary process.
(c) The decomposition done in R is displayed in the figure below and the output. Explain the results of this decomposition.
plot(decompose(profits.ts)) decompose(profits.ts) $seasonal Time Series: Start = c(1996, 1) End = c(2005, 2) Frequency = 2 [1] 87.71875 -87.71875 87.71875 -87.71875 87.71875 -87.71875 87. [8] -87.71875 87.71875 -87.71875 87.71875 -87.71875 87.71875 -87. [15] 87.71875 -87.71875 87.71875 -87.71875 87.71875 -87.
$trend Time Series: Start = c(1996, 1) End = c(2005, 2) Frequency = 2 [1] NA 629.75 673.25 706.00 738.75 771.25 793.00 815.00 826.00 826. [11] 836.75 858.25 869.00 869.00 869.00 880.00 902.00 923.75 945.25 NA
$random Time Series: Start = c(1996, 1) End = c(2005, 2) Frequency = 2 [1] NA -21.03125 21.03125 -10.28125 -0.46875 11.46875 -11. [8] 11.71875 -0.71875 0.71875 -11.46875 11.46875 -0.71875 0. [15] -0.71875 -10.28125 10.28125 -10.03125 10.03125 NA
$figure [1] 87.71875 -87.
$type [1] "additive"
attr(,"class") [1] "decomposed.ts"
600
800
1000
observed
650
750
850
950
trend
−
0
50
seasonal
−
0
10
20
1996 1998 2000 2002 2004
random
Time
Decomposition of additive time series
l l l
l l
l
l
l l
l
l
l
l (^) ll
l
l
l
l l
l
l l
2000 4000 6000 8000 10000
1000
2000
3000
4000
5000
6000
miles
claims
B
A
(a) There are 3 types of observations considered unusual: leverage points, outliers, and influential. Explain each of these types and say which type are the points labeled A and B in the figure.
(b) A regression analysis using the full data set was performed with claims as the response and miles as the predictor. The output below is produced from R. Explain why the slope coefficient is negative.
lm1 <- lm(claims~miles) summary(lm1)
Call: lm(formula = claims ~ miles)
Residuals: Min 1Q Median 3Q Max -1502.3 -770.6 192.1 507.3 1821.
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4756.8315 970.6979 4.900 7.59e-05 *** miles -0.3620 0.1188 -3.047 0.00613 **
Signif. codes: 0 ’’ 0.001 ’’ 0.01 ’’ 0.05 ’.’ 0.1 ’ ’ 1
Residual standard error: 946.6 on 21 degrees of freedom Multiple R-Squared: 0.3065, Adjusted R-squared: 0. F-statistic: 9.282 on 1 and 21 DF, p-value: 0.
(c) After removing observation points A and B, a further regression analysis was performed and output is produced below. Observe the marked change in the results. Explain why the removal of the two points causes these dramatic changes.
lm2 <- lm(claims~miles,data=ins.claims[-c(6,10),]) summary(lm2)
Call: lm(formula = claims ~ miles, data = ins.claims[-c(6, 10), ])
Residuals: Min 1Q Median 3Q Max -604.30 -315.05 32.59 209.36 1075.
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2945.2307 926.0968 -3.180 0.00493 ** miles 0.5355 0.1091 4.909 9.74e-05 ***
Signif. codes: 0 ’’ 0.001 ’’ 0.01 ’’ 0.05 ’.’ 0.1 ’ ’ 1
Residual standard error: 412.7 on 19 degrees of freedom Multiple R-Squared: 0.5592, Adjusted R-squared: 0. F-statistic: 24.1 on 1 and 19 DF, p-value: 9.735e-
(d) How would you advise your client who wanted to use either of the two regression equations as a summary of the relation between claims and miles traveled?