






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The solutions to exercise 1 in the applied actuarial statistics course offered at the university of x in the fall of 2008. The solutions cover various statistical concepts such as expanding a square, finding percentiles, and investigating the normality of data. The document also includes r code for generating histograms and q-q plots.
Typology: Assignments
1 / 10
This page cannot be seen from the preview
Don't miss anything!







Problem 1:
(a) By expanding the square, we get
n
i= 1
(Yi − Y)^2 =
n
i= 1
(Y i^2 − 2 YiY + Y
2 )
n
i= 1
Y i^2 − 2
n
i= 1
YiY +
n
i= 1
2 .
(b) Because Y is a constant with respect to the summation and nY = (^) ∑ni= 1 Yi, we have
n
i= 1
YiY =
n
i= 1
YiY = Y
n
i= 1
Yi = Y(nY) = nY 2 .
Similarly, we have n
i= 1
2 = nY 2 .
(c) Putting together the results of parts (a) and (b), we have n
i= 1
(Yi − Y)^2 =
n
i= 1
Y^2 i − 2 nY 2
n
i= 1
Y i^2
− nY 2 ,
for which the desired result follows.
Problem 2:
(a) With α = 75 and n = 30 observation, the 75th percentile is the approximate value of the (n − 1 )α/ 100 ) + 1 -th observation, or equivalently the ( 30 − 1 ) ∗ 75 / 100 + 1 = 22.75 -th observation. This then requires the interpolation of 22nd and 23rd observations. We have
Y(22.75) = 0.25Y( 22 ) + 0.75Y( 23 ) = 0.25(319.9) + 0.75(324.5) = 323.35.
This value says that there are approximately 75% observations that are below 323.35.
(b) (359.9 sY− Y)= (359.9 53.8656−278.6 )= 1.51 standard deviations above the mean is 359.9.
(c) The required probability is given by
(d) First, note that (^) ∑^30 i= 1 Yi = nY = 30 (278.6) = 8358. Furthermore, we have (^) ∑^30 i= 1 Y i^2 − nY
(n − 1 )s^2 Y = ( 30 − 1 )(53.8656)^2 = 84143.58. Thus, we have (^) ∑^30 i= 1 Y i^2 = nY 2
(e) Omitting the last 2 largest observations, we have (^) ∑^28 i= 1 Yi = (^) ∑^30 i= 1 Yi − 2 × 359.9 = 7638.2 and ∑^28 i= 1 Y^2 i =^ ∑^30 i= 1 Y i^2 −^2 ×^ 359.9^2 =^ 2153626.
(f) Thus, Ynew = 7638.2/ 28 = 272.7929 and s^2 Y,new =
50.91022. The percentage change in the mean is (278.6-272.7929)/278.6 = 2.1% and in the stan- dard deviation is (53.8656 - 50.91022)/53.8656 = 5.5%. This says that the two largest observations decreased the mean by 2.1% and the standard deviation by 5.5%.
Problem 3:
(a) For the players’ salaries, the summary statistics are provided below:
> NFLSAL.1990 <- read.csv("C:/.../Math238-Fall2007/Exercises- /R-data-analysis/NFLSAL-1990.csv") > attach(NFLSAL.1990) > names(NFLSAL.1990) [1] "nflsal90"
> summary(nflsal90) Min. 1st Qu. Median Mean 3rd Qu. Max. 75000 165500 280000 353800 447500 1500000
> sd(nflsal90) [1] 265297.
> quantile(nflsal90,c(0.25,0.75)) 25% 75% 165500 447500
(b) In investigating whether the players’ salaries follow a Normal distribution, we can do three things: (1) draw a histogram with superimposed Normal density, (2) draw a Normal QQ-plot, and (3) approximate the percentages within 1, 2 and 3 std deviations away from the mean. The histogram and QQ-plot are shown below (after the corresponding R commands).
> hist(nflsal90,br=25,freq=FALSE,xlab="",ylab="", main="Histogram of 1990 NFL Players’ Salaries",cex.main=1.5) > mean.sal <- mean(nflsal90) > sd.sal <- sd(nflsal90)
> curve(dnorm(x,mean=mean.sal,sd=sd.sal),from=0,to=1500000,add=TRUE,col="blue")
Note that any one of these do not provide a strong support that players’ salaries follow a Normal distribution.
(c) Now consider the natural logarithm of the players’ salaries. The summary statistics are below:
> log.nflsal90 <- log(nflsal90)
> summary(log.nflsal90) Min. 1st Qu. Median Mean 3rd Qu. Max. 11.23 12.02 12.54 12.54 13.01 14.
> sd(log.nflsal90) [1] 0.
> quantile(log.nflsal90,c(0.25,0.75)) 25% 75% 12.01671 13.
(d) In investigating whether a Normal distribution is suitable for the logged salaries, we follow the same procedure as in part (a).
> hist(log.nflsal90,br=25,freq=FALSE,xlab="",ylab="", main="Histogram of the log of 1990 NFL Players’ Salaries",cex.main=1.3) > mean.logsal <- mean(log.nflsal90) > sd.logsal <- sd(log.nflsal90)
> curve(dnorm(x,mean=mean.logsal,sd=sd.logsal),from=11,to=14.5,add=TRUE,col="blue")
> qqnorm(log.nflsal90,cex.lab=1.4,cex.axis=1.5,cex.main=1.5,cex=0.75) > qqline(log.nflsal90,col="blue")
Histogram of the log of 1990 NFL Players' Salaries
11.5 12.0 12.5 13.0 13.5 14.
l l l l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l l l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
llll
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
ll l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l l
l
l
l
ll
ll
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l l
l
l
−3 −2 −1 0 1 2 3
Normal Q−Q Plot
Theoretical Quantiles
Sample Quantiles
> count.log1sd <-sum(log.nflsal90<(mean.logsal+sd.logsal)) -sum(log.nflsal90<(mean.logsal-sd.logsal)) > pct.log1sd <- count.log1sd/length(log.nflsal90) > print("within 1 sd") [1] "within 1 sd" > print(c(count.log1sd, pct.log1sd)) [1] 130 0.
> count.log2sd <- sum(log.nflsal90<(mean.logsal+2sd.logsal)) -sum(log.nflsal90<(mean.logsal-2sd.logsal)) > pct.log2sd <- count.log2sd/length(log.nflsal90) > print("within 2 sd") [1] "within 2 sd" > print(c(count.log2sd, pct.log2sd)) [1] 191 0.
> count.log3sd <- sum(log.nflsal90<(mean.logsal+3sd.logsal)) -sum(log.nflsal90<(mean.logsal-3sd.logsal)) > pct.log3sd <- count.log3sd/length(log.nflsal90) > print("within 3 sd") [1] "within 3 sd" > print(c(count.log3sd, pct.log3sd)) [1] 198 1
These results provide some evidence that the log of the players’ salaries follow a Normal distri-
(c) A 95% prediction interval for an additional observation is given by
Y ± 1.96sY
But because claims cannot be negative, your 95% confidence interval should be reduced to (0,12866.45).
Problem 5:
(a) The summary statistics are printed below:
> hur.loss <- read.csv("C:/.../HurricaneLosses.csv") > hur.loss Year Loss 1 1977 2000 2 1971 1380 3 1971 2000 4 1964 2000 5 1968 2580 6 1971 4730 7 1956 3700 8 1961 4250 9 1966 5400 10 1955 4500 11 1958 5000 12 1974 14720 13 1959 7900 14 1971 13500 15 1976 22697 16 1964 12000 17 1949 8300 18 1959 13000 19 1950 10450 20 1954 12500 21 1973 32300 22 1980 57911 23 1964 23000 24 1955 25200 25 1967 34800 26 1957 32200 27 1979 122070 28 1975 119189 29 1972 97853 30 1964 67200 31 1960 91000 32 1961 100000 33 1969 165300 34 1954 122050
35 1954 129700 36 1970 309950 37 1979 752510 38 1965 500000 > attach(hur.loss) > names(hur.loss) [1] "Year" "Loss" > summary(Loss) Min. 1st Qu. Median Mean 3rd Qu. Max. 1380 5100 18710 77230 96140 752500 > sd(Loss) [1] 148485. > quantile(Loss,.95) 95%
(b) The histogram below, with a Normal density superimposed, together with the Normal q-q plot, clearly provide an indication that the Loss variable is too highly skewed that a Normal distribu- tion does not fit well.
> hist(Loss,br=25,cex=1.4,freq=FALSE) > curve(dnorm(x,mean=mean(Loss),sd=sd(Loss)),from=0,to=800000,add=TRUE,col="blue")
> qqnorm(Loss,cex.lab=1.4,cex.axis=1.5,cex.main=1.5,cex=0.75) > qqline(Loss,col="blue")
Histogram of Loss
Loss
Density
0e+00 2e+05 4e+05 6e+05 8e+
0.0e+
4.0e−
8.0e−
1.2e−
l l l l l l l^ l^ ll^ ll^ l^ ll^ llll^ l l l ll^ ll
ll l ll^ l
l l l
l
l
l
0e+
2e+
4e+
6e+
Normal Q−Q Plot
Theoretical Quantiles
Sample Quantiles
(a) The summary statistics of the logarithm of the Loss are printed below:
possible inflationary values. The second may be fixed by looking at the time trend of the data, which at this point, we do not have the tools to analyze.