












































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Chapter 2. Basic forecasting methods. Holt-Winter's method, additive seasonality. The equations are. Lt = α(Yt −St−s)+(1−α)(Lt−1 +bt−1),.
Typology: Study Guides, Projects, Research
1 / 52
This page cannot be seen from the preview
Don't miss anything!













































i
“All models are wrong, but some models are useful.” – George E. P. Box (1919–2013)
iv
that it is freely accessible to read online (at https://www.otexts.org/fpp/). A few hard copies of the book can be found at the Hartley Library. Other interesting references include:
Instructor Dr Alain Zemkoho School of Mathematics Building 54, Room 10027 [email protected]
PhD Teaching Assistantsa: Zulkipli Hafizah Binti, Fulin Xie and Yuan Zhang aThey will join me during the workshops to help with your questions on the exercises. Please do not contact them for any assistance related to the course- work assignment.
v
2 Chapter 1. Introduction and basic tools
This module will be concerned with such simple time series models, as well as some explana- tory modelling (see chapter on regression). The methods that we will discuss are all based on extrapolation into the future of patterns shown in the past. Confidence in such forecasts is therefore based on confidence that such patterns will, in future, remain stable. We begin with the preliminaries to forecasting that enable you to begin to find the best forecast- ing model to use with a particular time series. Before we start with the basic tools that we will be using in the course, let us recall that our focus will mostly be on two types of data (time series data and cross-sectional data) that we are now going to formally define. Definition 1.0.1 Time series data are data from a unit (or a group of units) observed in several successive periods, whereas cross-sectional data are data from units observed at the same time or in the same time period. The latter may be single observations from a sample survey or from all units in a population.
Though our main focus here will be on the first type, the second will also be of a great use, especially in Chapter 3. Examples of time series are discussed in Demo 1.1 while an example of cross-sectional dataset is given in Demo 1.3.
This course is a practical introduction to the skills of forecasting. To give experience in encoun- tering a variety of time series, real-life datasets are used, both during the course and for the final coursework.
1.1.1 Graphical summaries Time plots The essence of forecasting models is that patterns are projected forward in time, while random effects are discarded. The first thing to do when forecasting is therefore to make a time plot and look for patterns. The following may be observed:
Demo 1.1: Using the data sets from the Chapter 1 data folder on Blackboard, make time plots for the corresponding series: Australian monthly electricity (TimePlotTransformElec.xls), US treasury bills (TimePlotUstreas.xls), Australian clay brick (TimePlotBricks.xls).
1.1 Graphical and numerical summaries 3
Seasonal plots If an initial inspection of a time plot leads you to suspect that seasonality may exists, then several methods exist to demonstrate and to give evidence for its existence. A seasonal plot is one where the time series is cut into regular periods and the time plots of each period are overlaid on top of one another. It is an effective means of demonstrating seasonality, for example to a client who is not overly technically minded.
Demo 1.2: Produce the time and seasonal plots for the Australian beer production data provided in the file with name TimeSeasonalPlotBeer.xls. If seasonal plots lead you to think that seasonality may exist, then further technical evidence can be provided by autocovariance and autocorrelations. These concepts are introduced in the next section. Before going to that, we provide another class of plots that is useful in analyzing bivariate-type data sets (i.e., data sets based on two variables). Scatterplots The graphs discussed so far are useful for time series data. Scatter plots are most useful for exploring relationships between variables in cross-sectional data. The automobile data of (19 Japanese cars, see “Data” sheet in BivariateJapaneseCars.xls) are not a time series. Thus making time or seasonal plots inappropriate for these data. However, these data are well suited to a scatterplot (see “BivariatePlot” sheet) such as that of price against mileage. In the figure we have plotted the variable we wish to forecast (price) against one of the explanatory variables (mileage). Each point on the graph represents one type of vehicle. The plot shows the relationship between price and mileage: vehicles with high mileage per gallon are generally cheaper than less fuel-efficient vehicles. (Both price and fuel-efficiency are related to the vehicle and engine size.) Vehicles with low mileage per gallon are generally priced over a range from around 12,000 to 25,000.The scatterplot helps us visualize the relationship and suggests that a forecasting model must include mileage as an explanatory variable. When there are several potential explanatory variables, it is useful to plot each variable against each other variable to analyze correlations (see definition below) between them. These plots can be arranged in a scatterplot matrix; see, e.g., the 4th question of the Exercises of Chapter 3 that will be discussed at the corresponding workshop.
Demo 1.3: Produce the scatterplot mapping the price of the 19 Japanese cars against their mileage (BivariateJapaneseCars.xls).
1.1.2 Numerical data summaries Covariance and correlation The most commonly used statistic for bivariate data (i.e. when there are two variables) is the covariance, and the correlation coefficient. If we have n pairs of observations (Xi, Yi) on two variables X and Y, then the formulas are respectively
CovXY =
n − 1
n ∑ i= 1
(Xi − X¯)(Yi − Y¯ )
and
rXY = CovXY SX SY
n √ i=^1 (Xi^ −^ X¯)(Yi^ −^ Y¯^ ) ∑ni= 1 (Xi −^ X¯)^2
∑ni= 1 (Yi −^ Y¯^ )^2
Here, X is the mean and SX =
1 n− 1 ∑
n i= 1
Xi − X
is the (sample) standard deviation. The correla- tion coefficient rXY , is a standardised version of the covariance and its value is always between
1.2 Decomposition 5
1.2.1 Trend estimation using moving averages
The first step in the decomposition is to produce an estimate, T̂t , of the trend-cycle. (Here and in what follows we use a circumflex to denote an estimate.) This is carried out by a smoothing technique known as moving averages. The basic idea is that values of observations which are close together in time will have trend-cycle components that are similar in value. Ignoring the seasonal component for the moment, the value of the trend-cycle component at some particular time point can then be obtained by taking an average of a set of observations about this time point. Because the values that are averaged depend on the time point, this is called a moving average. There are many different forms that a moving average can take. Many have been constructed using ad-hoc arguments and reasoning. All boil down to being special cases of what is called a k-point weighted moving average:
Mt =
m ∑ j=−m
a (^) jYt+ j
where m = (k-1)/2 is called the half-width, and the a (^) j are called the weights. Note that in this definition k must be an odd number. The simplest versions are the cases where all the weights are the same. This is then called a simple moving average of order k. For example, if k = 3, then
Mt = (Yt− 1 +Yt +Yt+ 1 )/ 3.
If the weights are symmetrically balanced about the centre value (i.e. about j = 0 in the sum), then this is called a centred moving average. Simple moving averages involving an even number of terms can be used, but are then not centred about an integer t. This can be redressed by averaging a second time only averaging the moving averages themselves. Thus, for example, if
M 2. 5 = (Y 1 +Y 2 +Y 3 +Y 4 )/4 and M 3. 5 = (Y 2 +Y 3 +Y 4 +Y 5 )/ 4
are two consecutive 4-point moving averages, then we can centre them by taking their average
(M 2. 5 + M 2. 5 )/ 2 = (Y 1 + 2 Y 2 + 2 Y 3 + 2 Y 4 +Y 5 )/ 8.
This example is called a 2×4 MA. It is simply a 5-point weighted moving average, with end weights each 1/8, and with the other three weights being 1/4. If applied to quarterly data, this 2×4 MA, would give equal weight to all four quarters, as the 1st and last values would apply to the same quarter (but in different years). Thus this smoother would smooth out quarterly seasonally variation. Similarly, a 2×12 MA would smooth out seasonal variation in monthly data. Question: What are the weights of a 2×12 MA smoother?
Demo 1.6: Fit 7MA and 2x12MA to the housing sales data (AddDecompHsales.xls).
There is a problem applying a moving average at the two ends of a time series when we run out of observations to calculate the complete summation. When fewer than k observations are available the weights are usually rescaled so that they sum to unity. An effect of a moving average is that it will underestimate trends at the ends of a time series. This means that the methods discussed so far are generally unsatisfactory for forecasting purposes when a trend is present.
6 Chapter 1. Introduction and basic tools
1.2.2 Additive decomposition
We now consider what might be called classical decomposition. These are methods developed in the 1920’s which form the basis of typical existing decomposition methods. We consider first the additive case, where (1.3) takes the form Yt = St + Tt + Et. (1.4) We assume throughout that the seasonal period is 12. The classical decomposition takes four steps: Step 1: Compute the centred 2x12 MA. Denote this series by Mt (corresponding to T̂t in (1.4) ). This series estimates the trend-cycle. Step 2: De-trend the original series by subtraction: Dt = Yt − Mt = St + Et. Step 3: Calculate a seasonal index for each month by taking the average of all the values each month, j:
Sˆ (^) j = 1 n (^) j
n (^) j ∑ k= 1
D (^) j+ 12 (k− 1 ).
In this formula, it is assumed that there are n (^) j values available for month j, so that the summation is over these n (^) j values. Step 4: The estimated irregularity is obtained by subtraction of the seasonal component from the de-trended series: Eˆt = Dt − Sˆ (^) j(t).
Here Sˆ (^) j(t) denotes the seasonal index for the month corresponding to observation Yt.
1.2.3 Multiplicative decomposition
For the multiplicative model Yt = St × Tt × Et , the method is called the ratio of actual to moving averages. There are again four steps: Step 1: Compute the centred 2x12 MA. Denote this series by Mt (as in the previous case, this corresponds to T̂t ). This step is exactly the same as in the additive model case. Step 2: Calculate Rt , the ratio of actual to moving averages:
R (^) t =
Yt Mt
Step 3: Calculate a seasonal index for each month by taking the average of all the values each month, j:
Sˆ (^) j = 1 n (^) j
n (^) j ∑ k= 1
R (^) j+ 12 (k− 1 ).
This step is exactly the same as in the additive case except that D is replaced by R. Step 4: Calculate the error using
Eˆt = Rt S^ ˆt^ =^
Yt Mt Sˆt
8 Chapter 1. Introduction and basic tools
1.3.2 Missing and erroneous data
Real-life data is liable to contain human errors, most of which cannot be known with certainty by the forecaster. However, some clear outliers may be considered with high probability to be erroneous. For example, a missing or extra numeral will produce a resulting number that is ten times smaller or larger than neighbouring entries in a time series, and could seriously disrupt a forecast. If the source of the data can be referred to, it might then be possible to correct the error: if not, an estimate should be made. If it is considered necessary to add an estimated value where missing or erroneous data is present, then this has to be carried out with due regard to the time series in question, and clear justification given. A local average value might be appropriate, but seasonality might also need to be considered.
1.3.3 Transformations
Sometimes a systematic adjustment of the data will lead to a simpler analysis: mathematical tranforms may be applied in certain cases. There are two ideas that are helpful in selecting an appropriate transform. First, it is usually easier to analyse a time series if the underlying mean varies in a linear way with time. Thus if the behaviour of the actual data has the form
Yt = at p^ + et ,
where a and p are constants and εt is an random ’error’, then the transform
Wt = (Yt )^1 /p^ = (at p^ + et )^1 /p^ = bt + δt ,
where b = a^1 /^ p, makes Wt look more ’linear’ than Yt. Note that the transformed ’error’, δ (^) t , will depend in a complicated way on et , a, p and t. However in many situations the behaviour of δ (^) t will remain ’random’ looking and be no more difficult to interpret that the initial error et. The above is known as a power transform. Another useful transform is the logarithmic transform:
Wt = loge(Yt ).
This can only be used if Yt > 0, as the logarithm of a negative quantity is complex valued. The second idea is that the random errors are most easily handled if their variability is not time dependent but remains essentially constant. A good transformation should therefore be variance stabilizing, producing errors that have a constant variance. For example if
Yt = a(t + et )p,
where the εt have a constant variance, then the power transform
Wt = (Yt )^1 /p^ = a^1 /p(t + et ) = bt + δt ,
where b = a^1 /^ p^ and δ (^) t = bet will not only linearise the trend, but will also be variance stabilizing, as δ (^) t will have constant variance. Finally note that, though we analyse the transformed data, we are really actually interested in the original sequence. So it is necessary to back transform results into the original units. Thus, for example in the last case, we might analyse the Wt and estimate b, by, say bˆ, but we would back transform to estimate a by
aˆ = bˆp.
1.4 Exercises 9
An important but somewhat difficult technical issue is that such transforms can destroy desirable properties like unbiasedness. A well known case concerns a random sample X 1 , X 2 , ... Xn, of size n. Here, the sample variance given by the formula
s^2 =
n − 1
n ∑ i= 1
(Xi − X¯)^2
is known to be an unbiased estimator for the variance. However, s, the obvious estimator for the standard deviation is not unbiased. When n is large this bias is, however, small.
Demo 1.9: Plot the Australian monthly electricity (TimePlotTransformElec.xls) data using the square root and the (natural) log transforms.
1.3.4 Calendar adjustments
If data is for calendar months, then account might have to be taken of the length of a month. The difference between the longest and shortest months is about (31- 28)/30 = 10%. The adjustment needed is
Wt =
×Yt =
×Yt.
Demo 1.10: Make separate time series plots of Yt and Wt for the data on the monthly milk production (CalAdjustMilk.xls) per cow.
The data files and demonstration spreadsheets are available on the Blackboard site (see Course Content). Note that in all spreadsheets, allowance will need to be made for the length of the dataset, in all calculations and plots.
Exercise 1.1: Make time plots of the building materials (buildingMaterials.xls) and cement production (cementProduction.xls) data sets. Give a title to the graph, dates on the X-axis and make sure the Y-axis is appropriately labelled (the files include details of the units involved, i.e. thousands of tonnes.) You can use adjust the colour scheme of the graphs to improve readability. What trends/cycles or seasonality do you observe?
Exercise 1.2: Produce seasonal plots for buildingMaterials.xls and cementProduction.xls. What seasonality do you observe?
Exercise 1.3: Calculate the correlation coefficient for the two datasets buildingMaterials.xls and cementProduction.xls over a suitable time period, using the Excel Worksheet function. Can a strong linear relationship (positive or negative) be observed?
Exercise 1.4: A VBA macro is used in ACFAusBeer.xlsm to calculate the autocorrelation function (ACF). The macro has as input the column of n observations, and outputs the ACF up to lag m = n/4. Make a copy of this spreadsheet and use to calculate the ACF of buildingMaterials.xls.
Exercise 1.5: Fit 7MA and 2x12MA moving averages to buildingMaterials.xls. Which is the more appropriate for smoothing the data in this case?
This chapter introduces basic forecasting methods based on averaging techniques. The basic forecasting methods to be considered are conventionally regarded as being divided in two groups: (i) averaging methods and (ii) exponential smoothing methods. Though it is convenient to follow this convention, it is important to realise at the outset that this distinction is artificial in that all the methods in this chapter are based on averages. They are thus all similar to the moving averages considered in the last chapter. The difference is that the averages are used here for forecasting rather than for describing past data. This point of potential confusion is made worse by the use of the name “exponential smoothing” for the second group. These methods are also based on weighted averages, where the weights decay in an exponential way from the most recent to the most distant data point. The term smoothing is being used simply to indicate that this weighted average smoothes the data irregularities. Thus, though the term smoothing here is used in the same sense as previously, the smoothing is being carried out in a different context from that used in the previous chapter. Before we start with the discussion of the methods, it is crucial to first present some tools that can be used to evaluate and compare the performance of forecasting techniques.
2.1.1 Measures of Accuracy
The forecaster needs to choose the best model to use for forecasting any particular time series. We discuss here different measures for comparing different forecasting models on the basis of forecasting errors. Let Ft be the forecast value and Yt be the actual observation at time t. Then the forecast error at time t is defined as et = Yt − Ft. (2.1) Usually Ft is calculated from previous values of Yt right up to and including the immediate preceding value Yt − 1. Thus Ft predicts just one step ahead. In this case Ft is called the one-step forecast and et is called the one-step forecast error. Usually we assess error not from one such et but from n values. Three measures of error are:
12 Chapter 2. Basic forecasting methods
(i) the mean error
n
n ∑ t= 1
et ,
(ii) the mean absolute error
n
n ∑ t= 1
|et | ,
(iii) and the mean square error
n
n ∑ t= 1
e^2 t.
The mean error is not very useful. It tends to be near zero as positive and negative errors tend to cancel. It is only of use in detecting systematic under or over forecasting. The mean square error is a squared quantity so be careful and do not directly compare it with the MAE. Its square root is usually similar to the MAE. The relative or percentage error is defined as
PEt =
Yt − Ft Yt
while the mean percentage error is
n
n ∑ t= 1
PEt
and the mean absolute percentage error is
n
n ∑ t= 1
|PEt |.
We illustrate these error measures in the following demonstration, which is based on two simple (or naïve) forecasts.
Demo 2.1: Set up forecasts NF1 and NF2 for the Australian beer data (NaiveFAusBeer.xls). Calculate the ME, MAE, MSE, MPE, MAPE for the Australian beer series data using NF1 and NF2. NF1 is defined by
Ft = Yt.
This simply takes the present Y value to be the forecast for the next period. The second naive forecast, NF2 which is defined next, takes into account some seasonal adjustment such that
Ft+ 1 = Yt − St + S(t− 12 )+ 1.
Here the present Y value is adjusted by subtracting Si, the current period’s seasonality, and adding Si+ 1 , the next period’s seasonality. The seasonality is obtained as
St =
m + 1
(mSt− 12 +Yt ) with St = Yt for t = 1 ,... , 12 ,
where m is the number of complete years of data available. We also set Ft+ 1 = Yt for t = 1 ,... , 12.