MATH6011: Forecasting, Study Guides, Projects, Research of Statistics

Chapter 2. Basic forecasting methods. Holt-Winter's method, additive seasonality. The equations are. Lt = α(Yt −St−s)+(1−α)(Lt−1 +bt−1),.

Typology: Study Guides, Projects, Research

2021/2022

Uploaded on 08/01/2022

hal_s95
hal_s95 🇵🇭

4.4

(655)

10K documents

1 / 52

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
i
MATH6011: Forecasting
All models are wrong, but some models are useful. George E. P. Box (1919–2013)
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34

Partial preview of the text

Download MATH6011: Forecasting and more Study Guides, Projects, Research Statistics in PDF only on Docsity!

i

MATH6011: Forecasting

“All models are wrong, but some models are useful.” – George E. P. Box (1919–2013)

iv

that it is freely accessible to read online (at https://www.otexts.org/fpp/). A few hard copies of the book can be found at the Hartley Library. Other interesting references include:

  1. Anderson, R.A., Sweeney, D.J. and Williams, T.A. 1994. An Introduction to Management Science. 7th Edn, West Publishing Co.;
  2. Draper, N.R. and Smith, H. 1981. Applied Regression Analysis, 2nd Ed. New York: Wiley;
  3. Gilchrist, W.G. 1976. Statistical Forecasting, New York: Wiley;
  4. Janert, P.K. 2011. Data Analysis with Open Source Tools. Sebastopol: O’Reilly;
  5. Wetherill, G.B. 1981. Intermediate Statistical Methods. London: Chapman and Hall. All notes and spreadsheets used in the module are available on the course Blackboard site under Course Documents, where the spreadsheets are grouped by chapter and workshop, respectively. The notes are meant to be worked through and each chapter is accompanied by a number of demos associated to the spreadsheets, illustrating the topic or method being discussed. They are an essential part of the text and must be carefully studied, possibly before the lectures. In the spreadsheets, the following convention for cells is used:
  • Cells with a Yellow background - Headings, incidental Information;
  • Cells with a Green background - Input information used in calculations on that sheet;
  • Cells with a Blue background - Calculations and results that you should be producing. Exercises included at the end of each chapter correspond to the worksheet for the workshop of the corresponding week. They will be worked through during the workshops (computer labs) that follow the Friday lecture. The workshop exercises follow the same patterns as the demos, and use the same data sets in some cases, in order to give you the opportunity to get more familiar with the related material, as focus at lectures will be more on the mathematical aspects of the models. Assessment: The assessment of the module is 100% by a single coursework assignment. You will be given the assignment and related instructions in the second week. Further details on the submission and other key dates of the module activities are given in the table on the next page. Feedback: A key opportunity to get feedback on your progress in the module will be during the weekly workshops. To benefit the most from the workshops, you are strongly encouraged to work on the problem sheets in advance before coming to the workshop. This will help us access where you are struggling and provide immediate help. It is also the best way for you to get well prepared for the exercises in your coursework assignment. You are also encouraged to come to my office hours to discuss any particular aspect of the lectures/material you might be struggling to understand. No appointment is needed to come to my office hour. I will be able to provide some brief element of feedback by email (efforts will be made to reply by the next working day after reception) if you have any quick questions. I have also arranged three voluntary sessions (assignment surgeries), prior to the coursework submission deadline, where you could ask questions and get feedback on the module and the coursework preparations; see next page for the dates. The final feedback on your performance on the coursework will be provided within 4 weeks after the submission deadline. Acknowledgements: Dr Honora Smith and Prof Russell Cheng are gratefully acknowledged for the development of previous drafts of the course notes and the related material.

Instructor Dr Alain Zemkoho School of Mathematics Building 54, Room 10027 [email protected]

PhD Teaching Assistantsa: Zulkipli Hafizah Binti, Fulin Xie and Yuan Zhang aThey will join me during the workshops to help with your questions on the exercises. Please do not contact them for any assistance related to the course- work assignment.

v

Contents

  • 1 Introduction and basic tools
  • 1.1 Graphical and numerical summaries
  • 1.1.1 Graphical summaries
  • 1.1.2 Numerical data summaries
  • 1.2 Decomposition
  • 1.2.1 Trend estimation using moving averages
  • 1.2.2 Additive decomposition
  • 1.2.3 Multiplicative decomposition
  • 1.2.4 Decomposition and forecasting
  • 1.3 Data preparation
  • 1.3.1 Length of the times series
  • 1.3.2 Missing and erroneous data
  • 1.3.3 Transformations
  • 1.3.4 Calendar adjustments
  • 1.4 Exercises
  • 2 Basic forecasting methods
  • 2.1 Accuracy analysis
  • 2.1.1 Measures of Accuracy
  • 2.1.2 ACF of forecast error
  • 2.1.3 Prediction interval
  • 2.2 Averaging methods
  • 2.3 Exponential smoothing methods
  • 2.3.1 Single exponential smoothing
  • 2.3.2 Holt’s linear exponential smoothing
  • 2.3.3 Holt-Winter’s method
  • 2.3.4 Excel implementation of the exponential smoothing method viii
  • 2.4 Exercises
  • 3 Forecasting using regression
  • 3.1 The model and key statistics
  • 3.1.1 Model description
  • 3.1.2 Computing the coefficients of the regression model
  • 3.1.3 Key statistics for model evaluation
  • 3.2 Selection of explanatory variables
  • 3.2.1 Adding variables to the initial model
  • 3.2.2 Time related explanatory variables
  • 3.2.3 Subset selection
  • 3.3 Multiple linear regression for forecasting
  • 3.3.1 Assumptions made and the validity of tests and forecasts
  • 3.3.2 Multicollinearity
  • 3.3.3 Using Regression.xlsm to forecast
  • 3.4 Exercises
  • 4 The ARIMA method
  • 4.1 Preliminary analysis
  • 4.1.1 The partial autocorrelation function
  • 4.1.2 A white noise model
  • 4.1.3 Stationarity
  • 4.2 ARIMA models
  • 4.2.1 Autoregression (AR) models
  • 4.2.2 Moving Average (MA) models
  • 4.2.3 ARIMA (p, d, q) models
  • 4.2.4 ARIMA(p, d, q)(P, D, Q)s models
  • 4.3 Model selection and forecasting
  • 4.3.1 Phase 1: Identification
  • 4.3.2 Phase 2: Parameters estimation and testing
  • 4.3.3 Phase 3: Forecasting using the ARIMA model
  • 4.3.4 Using ARIMA.xlsm to forecast
  • 4.4 Exercises

2 Chapter 1. Introduction and basic tools

This module will be concerned with such simple time series models, as well as some explana- tory modelling (see chapter on regression). The methods that we will discuss are all based on extrapolation into the future of patterns shown in the past. Confidence in such forecasts is therefore based on confidence that such patterns will, in future, remain stable. We begin with the preliminaries to forecasting that enable you to begin to find the best forecast- ing model to use with a particular time series. Before we start with the basic tools that we will be using in the course, let us recall that our focus will mostly be on two types of data (time series data and cross-sectional data) that we are now going to formally define. Definition 1.0.1 Time series data are data from a unit (or a group of units) observed in several successive periods, whereas cross-sectional data are data from units observed at the same time or in the same time period. The latter may be single observations from a sample survey or from all units in a population.

Though our main focus here will be on the first type, the second will also be of a great use, especially in Chapter 3. Examples of time series are discussed in Demo 1.1 while an example of cross-sectional dataset is given in Demo 1.3.

1.1 Graphical and numerical summaries

This course is a practical introduction to the skills of forecasting. To give experience in encoun- tering a variety of time series, real-life datasets are used, both during the course and for the final coursework.

1.1.1 Graphical summaries Time plots The essence of forecasting models is that patterns are projected forward in time, while random effects are discarded. The first thing to do when forecasting is therefore to make a time plot and look for patterns. The following may be observed:

  • A trend, which is a long term increase or decrease in the variable of interest.
  • A seasonal/periodic pattern appears when a time series is affected by seasonal factors such as time of the year or the day of the week.
  • A cyclical pattern, which is one where there are rises and falls but not of regular period, generally thought of as longer in time, e.g., several years. It should be noted that combinations of the above three types of pattern occur frequently. We will treat trend and cycles together in the following analysis, and will often refer to trend-cycles. A time series is said to be stationary if the distribution of the fluctuations is not time dependent. In particular both the variability about the mean, as well as the mean must be independent of time. A stationary time series therefore has no trend, cycle or seasonality and no patterns that can be used for forecasting.

Demo 1.1: Using the data sets from the Chapter 1 data folder on Blackboard, make time plots for the corresponding series: Australian monthly electricity (TimePlotTransformElec.xls), US treasury bills (TimePlotUstreas.xls), Australian clay brick (TimePlotBricks.xls).

  • The first example, Australian monthly electricity production, displays a clear trend and seasonality. Note that both the seasonal variability as well as the mean show a trend.
  • The US treasury bill contracts data shows a trend, but there is less certainty as to whether this trend will continue.
  • The data on Australian clay brick production contains occasional large fluctuations which are difficult to explain, and hence predict, without knowing the underlying causes.

1.1 Graphical and numerical summaries 3

Seasonal plots If an initial inspection of a time plot leads you to suspect that seasonality may exists, then several methods exist to demonstrate and to give evidence for its existence. A seasonal plot is one where the time series is cut into regular periods and the time plots of each period are overlaid on top of one another. It is an effective means of demonstrating seasonality, for example to a client who is not overly technically minded.

Demo 1.2: Produce the time and seasonal plots for the Australian beer production data provided in the file with name TimeSeasonalPlotBeer.xls. If seasonal plots lead you to think that seasonality may exist, then further technical evidence can be provided by autocovariance and autocorrelations. These concepts are introduced in the next section. Before going to that, we provide another class of plots that is useful in analyzing bivariate-type data sets (i.e., data sets based on two variables). Scatterplots The graphs discussed so far are useful for time series data. Scatter plots are most useful for exploring relationships between variables in cross-sectional data. The automobile data of (19 Japanese cars, see “Data” sheet in BivariateJapaneseCars.xls) are not a time series. Thus making time or seasonal plots inappropriate for these data. However, these data are well suited to a scatterplot (see “BivariatePlot” sheet) such as that of price against mileage. In the figure we have plotted the variable we wish to forecast (price) against one of the explanatory variables (mileage). Each point on the graph represents one type of vehicle. The plot shows the relationship between price and mileage: vehicles with high mileage per gallon are generally cheaper than less fuel-efficient vehicles. (Both price and fuel-efficiency are related to the vehicle and engine size.) Vehicles with low mileage per gallon are generally priced over a range from around 12,000 to 25,000.The scatterplot helps us visualize the relationship and suggests that a forecasting model must include mileage as an explanatory variable. When there are several potential explanatory variables, it is useful to plot each variable against each other variable to analyze correlations (see definition below) between them. These plots can be arranged in a scatterplot matrix; see, e.g., the 4th question of the Exercises of Chapter 3 that will be discussed at the corresponding workshop.

Demo 1.3: Produce the scatterplot mapping the price of the 19 Japanese cars against their mileage (BivariateJapaneseCars.xls).

1.1.2 Numerical data summaries Covariance and correlation The most commonly used statistic for bivariate data (i.e. when there are two variables) is the covariance, and the correlation coefficient. If we have n pairs of observations (Xi, Yi) on two variables X and Y, then the formulas are respectively

CovXY =

n − 1

n ∑ i= 1

(Xi − X¯)(Yi − Y¯ )

and

rXY = CovXY SX SY

n √ i=^1 (Xi^ −^ X¯)(Yi^ −^ Y¯^ ) ∑ni= 1 (Xi −^ X¯)^2

∑ni= 1 (Yi −^ Y¯^ )^2

Here, X is the mean and SX =

1 n− 1 ∑

n i= 1

Xi − X

is the (sample) standard deviation. The correla- tion coefficient rXY , is a standardised version of the covariance and its value is always between

1.2 Decomposition 5

1.2.1 Trend estimation using moving averages

The first step in the decomposition is to produce an estimate, T̂t , of the trend-cycle. (Here and in what follows we use a circumflex to denote an estimate.) This is carried out by a smoothing technique known as moving averages. The basic idea is that values of observations which are close together in time will have trend-cycle components that are similar in value. Ignoring the seasonal component for the moment, the value of the trend-cycle component at some particular time point can then be obtained by taking an average of a set of observations about this time point. Because the values that are averaged depend on the time point, this is called a moving average. There are many different forms that a moving average can take. Many have been constructed using ad-hoc arguments and reasoning. All boil down to being special cases of what is called a k-point weighted moving average:

Mt =

m ∑ j=−m

a (^) jYt+ j

where m = (k-1)/2 is called the half-width, and the a (^) j are called the weights. Note that in this definition k must be an odd number. The simplest versions are the cases where all the weights are the same. This is then called a simple moving average of order k. For example, if k = 3, then

Mt = (Yt− 1 +Yt +Yt+ 1 )/ 3.

If the weights are symmetrically balanced about the centre value (i.e. about j = 0 in the sum), then this is called a centred moving average. Simple moving averages involving an even number of terms can be used, but are then not centred about an integer t. This can be redressed by averaging a second time only averaging the moving averages themselves. Thus, for example, if

M 2. 5 = (Y 1 +Y 2 +Y 3 +Y 4 )/4 and M 3. 5 = (Y 2 +Y 3 +Y 4 +Y 5 )/ 4

are two consecutive 4-point moving averages, then we can centre them by taking their average

(M 2. 5 + M 2. 5 )/ 2 = (Y 1 + 2 Y 2 + 2 Y 3 + 2 Y 4 +Y 5 )/ 8.

This example is called a 2×4 MA. It is simply a 5-point weighted moving average, with end weights each 1/8, and with the other three weights being 1/4. If applied to quarterly data, this 2×4 MA, would give equal weight to all four quarters, as the 1st and last values would apply to the same quarter (but in different years). Thus this smoother would smooth out quarterly seasonally variation. Similarly, a 2×12 MA would smooth out seasonal variation in monthly data. Question: What are the weights of a 2×12 MA smoother?

Demo 1.6: Fit 7MA and 2x12MA to the housing sales data (AddDecompHsales.xls).

There is a problem applying a moving average at the two ends of a time series when we run out of observations to calculate the complete summation. When fewer than k observations are available the weights are usually rescaled so that they sum to unity. An effect of a moving average is that it will underestimate trends at the ends of a time series. This means that the methods discussed so far are generally unsatisfactory for forecasting purposes when a trend is present.

6 Chapter 1. Introduction and basic tools

1.2.2 Additive decomposition

We now consider what might be called classical decomposition. These are methods developed in the 1920’s which form the basis of typical existing decomposition methods. We consider first the additive case, where (1.3) takes the form Yt = St + Tt + Et. (1.4) We assume throughout that the seasonal period is 12. The classical decomposition takes four steps: Step 1: Compute the centred 2x12 MA. Denote this series by Mt (corresponding to T̂t in (1.4) ). This series estimates the trend-cycle. Step 2: De-trend the original series by subtraction: Dt = Yt − Mt = St + Et. Step 3: Calculate a seasonal index for each month by taking the average of all the values each month, j:

Sˆ (^) j = 1 n (^) j

n (^) j ∑ k= 1

D (^) j+ 12 (k− 1 ).

In this formula, it is assumed that there are n (^) j values available for month j, so that the summation is over these n (^) j values. Step 4: The estimated irregularity is obtained by subtraction of the seasonal component from the de-trended series: Eˆt = Dt − Sˆ (^) j(t).

Here Sˆ (^) j(t) denotes the seasonal index for the month corresponding to observation Yt.

1.2.3 Multiplicative decomposition

For the multiplicative model Yt = St × Tt × Et , the method is called the ratio of actual to moving averages. There are again four steps: Step 1: Compute the centred 2x12 MA. Denote this series by Mt (as in the previous case, this corresponds to T̂t ). This step is exactly the same as in the additive model case. Step 2: Calculate Rt , the ratio of actual to moving averages:

R (^) t =

Yt Mt

Step 3: Calculate a seasonal index for each month by taking the average of all the values each month, j:

Sˆ (^) j = 1 n (^) j

n (^) j ∑ k= 1

R (^) j+ 12 (k− 1 ).

This step is exactly the same as in the additive case except that D is replaced by R. Step 4: Calculate the error using

Eˆt = Rt S^ ˆt^ =^

Yt Mt Sˆt

8 Chapter 1. Introduction and basic tools

1.3.2 Missing and erroneous data

Real-life data is liable to contain human errors, most of which cannot be known with certainty by the forecaster. However, some clear outliers may be considered with high probability to be erroneous. For example, a missing or extra numeral will produce a resulting number that is ten times smaller or larger than neighbouring entries in a time series, and could seriously disrupt a forecast. If the source of the data can be referred to, it might then be possible to correct the error: if not, an estimate should be made. If it is considered necessary to add an estimated value where missing or erroneous data is present, then this has to be carried out with due regard to the time series in question, and clear justification given. A local average value might be appropriate, but seasonality might also need to be considered.

1.3.3 Transformations

Sometimes a systematic adjustment of the data will lead to a simpler analysis: mathematical tranforms may be applied in certain cases. There are two ideas that are helpful in selecting an appropriate transform. First, it is usually easier to analyse a time series if the underlying mean varies in a linear way with time. Thus if the behaviour of the actual data has the form

Yt = at p^ + et ,

where a and p are constants and εt is an random ’error’, then the transform

Wt = (Yt )^1 /p^ = (at p^ + et )^1 /p^ = bt + δt ,

where b = a^1 /^ p, makes Wt look more ’linear’ than Yt. Note that the transformed ’error’, δ (^) t , will depend in a complicated way on et , a, p and t. However in many situations the behaviour of δ (^) t will remain ’random’ looking and be no more difficult to interpret that the initial error et. The above is known as a power transform. Another useful transform is the logarithmic transform:

Wt = loge(Yt ).

This can only be used if Yt > 0, as the logarithm of a negative quantity is complex valued. The second idea is that the random errors are most easily handled if their variability is not time dependent but remains essentially constant. A good transformation should therefore be variance stabilizing, producing errors that have a constant variance. For example if

Yt = a(t + et )p,

where the εt have a constant variance, then the power transform

Wt = (Yt )^1 /p^ = a^1 /p(t + et ) = bt + δt ,

where b = a^1 /^ p^ and δ (^) t = bet will not only linearise the trend, but will also be variance stabilizing, as δ (^) t will have constant variance. Finally note that, though we analyse the transformed data, we are really actually interested in the original sequence. So it is necessary to back transform results into the original units. Thus, for example in the last case, we might analyse the Wt and estimate b, by, say bˆ, but we would back transform to estimate a by

aˆ = bˆp.

1.4 Exercises 9

An important but somewhat difficult technical issue is that such transforms can destroy desirable properties like unbiasedness. A well known case concerns a random sample X 1 , X 2 , ... Xn, of size n. Here, the sample variance given by the formula

s^2 =

n − 1

n ∑ i= 1

(Xi − X¯)^2

is known to be an unbiased estimator for the variance. However, s, the obvious estimator for the standard deviation is not unbiased. When n is large this bias is, however, small.

Demo 1.9: Plot the Australian monthly electricity (TimePlotTransformElec.xls) data using the square root and the (natural) log transforms.

1.3.4 Calendar adjustments

If data is for calendar months, then account might have to be taken of the length of a month. The difference between the longest and shortest months is about (31- 28)/30 = 10%. The adjustment needed is

Wt =

of days in an average month

of days in month i

×Yt =

of days in month i

×Yt.

Demo 1.10: Make separate time series plots of Yt and Wt for the data on the monthly milk production (CalAdjustMilk.xls) per cow.

1.4 Exercises

The data files and demonstration spreadsheets are available on the Blackboard site (see Course Content). Note that in all spreadsheets, allowance will need to be made for the length of the dataset, in all calculations and plots.

Exercise 1.1: Make time plots of the building materials (buildingMaterials.xls) and cement production (cementProduction.xls) data sets. Give a title to the graph, dates on the X-axis and make sure the Y-axis is appropriately labelled (the files include details of the units involved, i.e. thousands of tonnes.) You can use adjust the colour scheme of the graphs to improve readability. What trends/cycles or seasonality do you observe?

Exercise 1.2: Produce seasonal plots for buildingMaterials.xls and cementProduction.xls. What seasonality do you observe?

Exercise 1.3: Calculate the correlation coefficient for the two datasets buildingMaterials.xls and cementProduction.xls over a suitable time period, using the Excel Worksheet function. Can a strong linear relationship (positive or negative) be observed?

Exercise 1.4: A VBA macro is used in ACFAusBeer.xlsm to calculate the autocorrelation function (ACF). The macro has as input the column of n observations, and outputs the ACF up to lag m = n/4. Make a copy of this spreadsheet and use to calculate the ACF of buildingMaterials.xls.

Exercise 1.5: Fit 7MA and 2x12MA moving averages to buildingMaterials.xls. Which is the more appropriate for smoothing the data in this case?

2. Basic forecasting methods

This chapter introduces basic forecasting methods based on averaging techniques. The basic forecasting methods to be considered are conventionally regarded as being divided in two groups: (i) averaging methods and (ii) exponential smoothing methods. Though it is convenient to follow this convention, it is important to realise at the outset that this distinction is artificial in that all the methods in this chapter are based on averages. They are thus all similar to the moving averages considered in the last chapter. The difference is that the averages are used here for forecasting rather than for describing past data. This point of potential confusion is made worse by the use of the name “exponential smoothing” for the second group. These methods are also based on weighted averages, where the weights decay in an exponential way from the most recent to the most distant data point. The term smoothing is being used simply to indicate that this weighted average smoothes the data irregularities. Thus, though the term smoothing here is used in the same sense as previously, the smoothing is being carried out in a different context from that used in the previous chapter. Before we start with the discussion of the methods, it is crucial to first present some tools that can be used to evaluate and compare the performance of forecasting techniques.

2.1 Accuracy analysis

2.1.1 Measures of Accuracy

The forecaster needs to choose the best model to use for forecasting any particular time series. We discuss here different measures for comparing different forecasting models on the basis of forecasting errors. Let Ft be the forecast value and Yt be the actual observation at time t. Then the forecast error at time t is defined as et = Yt − Ft. (2.1) Usually Ft is calculated from previous values of Yt right up to and including the immediate preceding value Yt − 1. Thus Ft predicts just one step ahead. In this case Ft is called the one-step forecast and et is called the one-step forecast error. Usually we assess error not from one such et but from n values. Three measures of error are:

12 Chapter 2. Basic forecasting methods

(i) the mean error

ME =

n

n ∑ t= 1

et ,

(ii) the mean absolute error

MAE =

n

n ∑ t= 1

|et | ,

(iii) and the mean square error

MSE =

n

n ∑ t= 1

e^2 t.

The mean error is not very useful. It tends to be near zero as positive and negative errors tend to cancel. It is only of use in detecting systematic under or over forecasting. The mean square error is a squared quantity so be careful and do not directly compare it with the MAE. Its square root is usually similar to the MAE. The relative or percentage error is defined as

PEt =

Yt − Ft Yt

× 100 ,

while the mean percentage error is

MPE =

n

n ∑ t= 1

PEt

and the mean absolute percentage error is

MAPE =

n

n ∑ t= 1

|PEt |.

We illustrate these error measures in the following demonstration, which is based on two simple (or naïve) forecasts.

Demo 2.1: Set up forecasts NF1 and NF2 for the Australian beer data (NaiveFAusBeer.xls). Calculate the ME, MAE, MSE, MPE, MAPE for the Australian beer series data using NF1 and NF2. NF1 is defined by

Ft = Yt.

This simply takes the present Y value to be the forecast for the next period. The second naive forecast, NF2 which is defined next, takes into account some seasonal adjustment such that

Ft+ 1 = Yt − St + S(t− 12 )+ 1.

Here the present Y value is adjusted by subtracting Si, the current period’s seasonality, and adding Si+ 1 , the next period’s seasonality. The seasonality is obtained as

St =

m + 1

(mSt− 12 +Yt ) with St = Yt for t = 1 ,... , 12 ,

where m is the number of complete years of data available. We also set Ft+ 1 = Yt for t = 1 ,... , 12.