

































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The estimation of beta using rolling OLS and GARCH methods over daily and monthly data for various utilities. The author examines whether beta is constant or changes over time by comparing estimates from daily and monthly data. The document also explores the use of realised beta and AR1 forecasts of future beta based on realised beta estimates.
Typology: Summaries
1 / 41
This page cannot be seen from the preview
Don't miss anything!


































This document describes some issues with the current approach to estimating the CAPM beta as a method to calculate appropriate discount rates to use to value risky investments and suggests a way forward through the use of multi- variate modelling. We take the use of the single factor (beta) CAPM as the correct model and explore statistical issues around the estimation of beta. The existing approach typically uses rolling least squares estimation over some win- dow to produce a current estimate of beta which is then treated as the estimate of beta going forward. Whilst this procedure is computationally straightforward it raises a number of statistical and conceptual issues. In particular the impor- tance of developing models that account for time variation in both variances and covariances of asset returns, and the requirement for an estimate of beta that is suitable for an investment horizon that may stretch to several years. We argue that if data is to be used to inform decision making then this has to be done in a way which respects the statistical framework.
The standard economic approach to modelling the required expected return on an asset is via the Capital Asset Pricing Model (CAPM) which captures the relationship between risk and expected return in the pricing of risky investments. The CAPM states that for an asset i the expected return E (Ri) over some horizon should satisfy
E (Ri) − Rf = βi (E (RM ) − Rf ) (1)
where Rf is the risk free rate over the same horizon, RM is the corresponding
market return and βi = Covariance V ariance((RRiM,R )M )is the (market) beta of asset i. RM is usually taken as the return on a broad index of assets and Rf usually proxied by return on Govt discount bills. One can interpret the CAPM as measuring the
∗Faculty of Economics, University of Cambridge. Any opinions expressed in this paper are those of the author alone. Please do not quote without permission.
required return on an asset as depending on the amount of risk that asset holds, given by βi, times the market price of holding risk, given by the expected market excess return (E (RM ) − Rf ). The CAPM captures the idea that expected (excess) returns are lower for assets where their actual returns covary less with the market (excess return). We accept a lower expected return on these assets because when market returns drop the individual asset return does not drop so much. We can rewrite the CAPM in terms of actual returns as
Ri − Rf = βi (RM − Rf ) + i
where captures the dierences between expected returns and actual returns. This now looks like a regression equation (with zero intercept). The excess re- turn on any asset has two components. One is a linear function of the excess return on the market, this is usually called the non-diversiable or systematic risk and holding this risk is what requires compensation in the market. The other is a random component which is usually called non-systematic or diversi- able risk.
Ri − Rf = αi ︸︷︷︸ zero
βi (RM − Rf ) ︸ ︷︷ ︸ nondiversiable
i ︸︷︷︸ idiosyncratic
In a time series context we dene at time t the returns Rit = E (Rit)+ηit and RM t = E (RM t) + ξM t where ηit and ξM t are then random variables capturing the dierence of outcomes from expectations. Under rational expectations these will be zero mean conditional on the information set available. Then we have the following relation between the observed returnsRit and RM t (we include a constant which the CAPM implies should be zero).
Rit − Rf t = αi + βi (RM t − Rf t) + ηit − βiξit (2) = αi + βi (RM t − Rf t) + uit (3)
In practical terms one wishes to obtain estimates of the unknown parameter βi. The long established technique is to note that if we have a time series t = 1 ,... , T of observations on Rit and RM t (or on the excess returns) then we can write a regression equation
Rit = ai + biRM t + vit (4)
the least squares estimator of b is given mechanically by
ˆbi =
Cov (Ri, RM ) V ar̂ (RM )
ΣTt=
Rit − R¯i
RM t − R¯M
ΣTt=
RM t − R¯M
where ̂ denotes sample estimates and R¯i = (^1) T ΣTt=1Rit and R¯M = (^) T^1 ΣTt=1RM t are
the sample averages. Since the least squares formula for ˆbi in (4) coincides with
forecasts or forecasting beta over longer horizons in this framework the same estimate should be used. If beta is time varying then estimation can still be done by OLS but other techniques may be more appropriate depending (partly) on the reasons for which we wish to estimate beta and the way in which beta varies. We may be inter- ested in short horizon (conditional) E (βt+h | Ωt) where h may be a few days or weeks and Ωt represents the information set available at time t. Or we may be interested in much longer run (unconditional) E (β) or indeed something in between. The appropriate modelling strategy in the rst case may not be the same as in the second. In particular to forecast over longer horizons it will be usually be necessary to specify some model for the time evolution of beta. It is important to see in forecasting some model of the time evolution of beta is always assumed. For example if we have a current estimate of βˆt and use this to forecast forward βt+h we are implicitly assuming E (βt+h | Ωt) = βt for all h. If beta varies only slowly (relative to the data sampling frequency) then beta in the immediate future may be well approximated by the current estimate, in which case OLS on the most recent data may still be the most useful strategy and this is very much the standard approach where a recent window (of say ve years of monthly data) is used to obtain a current estimate of beta. But this current estimate may be a poor guide if beta reverts to some longer run level from its current levels and we are interested in longer horizons. For example if we assume that βt evolves as an AR1 around some long run level β∗^ we might write a simple AR1 model
(βt − β∗) = λ (βt− 1 − β∗) + ηt
and if we wish an estimate of the average β over the next h periods. Then it is easy to show β˜t,t+h = (1 − θh) β∗^ + θh βˆt
where β˜t,t+h is the average expected β over the period t → t + h (so β˜t,t+h = 1 h
∑h τ =1 E^ (βt+τ^ |^ βt)) and^ θh^ =^
λ( 1 −λh) h(1−λ). The forecast average over the period is thus a weighted average of current the estimate of βt and the long run (equi- librium) value β∗^ (which again we may have to estimate). Note as well one implication is that if we estimate or require beta over quarterly data we have roughly 66 days daily beta. Some sort of mean reverting behaviour in the process for βt could make the former estimate quite dierent from the latter.
3 OLS estimation
Given a time series of observations Ri,t and RM,t over an interval t = 1,... , T the least squares estimator is given by
ˆbi = Σ
T t=
Rit − R¯i
RM t − R¯M
ΣTt=
RM t − R¯M
If the underlying population parameters (ie Cov (Ri, RM )and V ar (RM )) are constant then under fairly mild conditions the sample estimates will converge to their population values in probability and so we can expect ˆbi to give consistent estimation of βi. If this is the belief then the appropriate estimation strategy should be to use all available observations on Ri and RM to obtain the most precise estimates of Cov (Ri, RM )and V ar (RM ) and the estimate of βi simply can be obtained from a regression of Ri on RM over the full sample. This can then be used whether we require an estimate of beta for short or long horizon forecasts. There is a very large literature using the simple OLS approach to estimation of beta originating with the work of Fama-Macbeth. In most of this literature there is at least an implicit acknowledgment that beta for individual stocks may vary over time. Much of the discussion focusses on the choice of sample period and observation frequency within that window (and the trade os between these two), for example for a monthly forecast of beta one might use the previous ve years of monthly data in estimation). If beta is believed to vary only slowly over the estimation interval and the requirement is for an estimate of beta over a comparable short future horizon this approach has much to recommend it.
It seems almost universally accepted that beta varies over time (and this is the usual justication for truncating the observation window for estimation). The hope presumably is that if Cov (Ri, RM )and V ar (RM ) are approximately constant over some interval t = 1,... , T (at levels σiM and σ^2 M , say) then the least squares estimator in a regression of Ri,t on RM,t will be approximately σ σiM 2 M over that interval and this may provide a suitable estimate for βi for subsequent use. That is, the average value of beta over the recent past can be used to estimate the expected value of beta over some comparable future interval. In this context the following questions arise:
We address each in turn.
3.1.1 Is there evidence that beta changes over time?
Figure 1 below shows estimates of beta from a rolling OLS of window size approx two years using daily data for SVT,NG and UU with ASX as the market proxy
random variation in the rolling estimates around the true (constant) value of βi for each company, whatever the estimation frequency. This is dicult to reconcile with the Figures. Indeed we see large short term uctuations where the addition of a single day or a month can generate substantial movement in the estimated β, and possibly a more longer term lower frequency variation. Since βi = Cov V ar(R(iR,RMM ) ) time variation in β can only occur if Cov (Ri, RM ) and/or V ar (RM ) vary over time. We can get a proxy for V ar (RM ) at daily frequency by graphing the squares of the daily return as below. Under ho- moscedasticity each of these would be an unbiased estimate of the (constant) variance of returns (ie we should see a more or less horizontal line).
.
.
.
.
.
.
.
.
.
.
00 02 04 06 08 10 12 14 16
ASX_RET^
Figure 3: Squared daily stock market returns
The visual impression of time variation is conrmed by a statistical test (a test for no relation between variance at time t and the previous 20 values has χ^220 = 1100. 8 which has a p-value around 1. 8 × 10 −^220 ). With such overwhelming evidence of time variation in V ar (RM ) the only way to obtain constant βi would be if Cov (Ri, RM ) had time variation that precisely mimics that of V ar (RM ). This seems unlikely. Squared returns for Severn Trent, National Grid and United Utilities are graphed below. Again test for homoscedasticity is overwhelmingly rejected in each case (χ^220 = 410, 712 and 940 respectively)
.
.
.
.
.
.
.
.
00 02 04 06 08 10 12 14 16
SVT_RET^
.
.
.
.
.
.
.
.
00 02 04 06 08 10 12 14 16
NG_RET^
.
.
.
.
.
.
.
.
00 02 04 06 08 10 12 14 16
UU_RET^
Figure 4: Squared daily returns for utilities
With such evidence of persistent time variation in the variances of these series it is extremely dicult to argue that β should be treated as a constant, except perhaps in the very short run. To make this argument would require that beta is allowed to be time varying but in such a way that it varies only very slowly over the estimation window, so that forecasting using this value over a similar window is a reasonable approximation - that is if beta is approximately constant at monthly intervals then one might try to use a history of monthly observations to forecast one or two months ahead. The problem of course is to obtain a sucient history of data for estimation would then require using several years of monthly data where the assumption of only slow variation might be dicult to defend. The alternative approach of estimating on a relatively shorter interval of, say, daily data and using this estimate to project beta forward over say a one month horizon is likely to be preferable and this seems to be the main justication for the current approach in nancial econometrics. However if one wishes to produce beta estimates for horizons further than days or even months the issue of time variation in the future as well as the past has to be acknowledged. One interesting feature of Figures 1 and 2 is that there appears to be high frequency movements (in beta) that are averaged out in the monthly regressions. This is at least suggestive of time variation in beta driven by a combination of temporary and more persistent shocks. The component GARCH model of Engle and Lee (1999) is one approach that tries to capture this short and longer horizon decay of shocks and provides more exible specication than the GARCH(1,1) models described below though at the cost of some computational complexity (the component GARCH is a restricted GARCH(2,2)).
3.1.2 Does this invalidate OLS as an estimation method?
If we accept that βi is time varying the immediate question is what precisely the OLS estimator in the CAPM regression then measures (note that there is nothing to stop us performing such a regression). Let Ri,t+1 be the one period conditional return on asset i from t to t+1 and RM,t+1 the corresponding excess market return. Express the time varying CAPM as (this follows Bollerslev and Zhang, 2003)
Ri,t+1 = αt+1|t + βi,t+1|tRM,t+1 + ηi,t+1 t = 0, 1 ,... , T − 1 (6)
Realised beta We can make the assumption that returns actually evolve in continuous time but can only be observed at discrete intervals (such as daily, weekly, monthly or quarterly). The process generating the returns will have underlying variance and covariances that are continuous (and possibly time varying) processes. By implication there is also a beta that evolves continuously in time. The realisa- tion of the returns, variances and covariances at the observed frequency will be given by cumulating the returns variances and covariances from the underlying continuous time processes. In these circumstances an alternative approach is to use the realised beta methodology. To estimate say the variance of a return in a particular quarter we recognise that this can be approximated by the cumulated variances of minute- by-minute or even second-by second returns with the approximation improving the higher the frequency we can sample the data within that quarter. As the sampling frequency increases in the limit we obtain the realised variance measure of Barndor-Neilsen and Shephard (2002) (see Anderson Bollerslev Diebold and Wu (2006) for discussion). This can also be done to construct realised covariance and hence we can obtain realised beta at say quarterly frequency using higher frequency such as daily (or even intra-daily though remember the caveats above) data. That is to obtain a direct estimate of βi,t+1|twhere t → t + 1 is say one quarter we use the higher frequency (such as daily) data within the sampling period t → t + 1. If we label higher frequency returns Ri,tj and RM,tj for j = 1,... , N during a period t (so j represents for example days within a month) then an estimate of the time t conditional beta is obtained as the ratio of the realised covariance and realised volatility
β^ ˆi,t+1|t =
j=1 Ri,tj RM,tj ∑N j=1 R 2 M,tj
which is simply obtained by a regression at the higher frequency within period t of asset returns on market returns. This formulation allows for continuous evolution of the underlying parameter and produces a consistent estimate of the underlying ratio between the integrated stock and market return covariance and the integrated stock market variance over the sampling period. Having obtained these realised beta estimates Bollerslev and Zhang suggest two ways of using them to estimate future beta, rstly by just holding the current estimate forward, and secondly by producing a time series of estimates of conditional betas at the lower frequency and then using a rolling window to produce (say) AR1 forecasts of the future beta by simply estimating an autoregression for the time series of realised betas. In principle this realised β method could be used to produce estimates at much greater horizons. If one species a sampling period of say 500 observations (two years) then the OLS estimates within that interval provide an estimate of the current 500 day conditional β. This could be projected forward at its current level (imposing a random walk model on future beta) as the forecast beta. Or
one could specify some model such as an AR1 for the conditional beta. However even with a total of 16 years of data there are only 8 non-overlapping two year blocks of daily data on which to calculate conditional betas. To estimate an AR1 model on so few observations and project forward for ve or even ten years is likely to introduce substantial noise given known problems with small sample autoregressive estimation. A strategy of reducing the sampling frequency for the CAPM to say quarterly and then use the daily data to produce conditional beta estimates at this frequency would allow a longer time series (of estimated conditional betas) for estimating a forecasting model at the cost of fewer daily observations being used to estimate each conditional beta. Bollerslev and Zhang report some evidence in favour of this high frequency plus AR approach to forecasting β at least at short forecast horizons. This methodology blends a non-parametric approach to estimating the conditional betas with a parametric forecasting model. Note also that a simple OLS over the full sample can be interpreted as a measure of the realised beta over that interval. Section 9 below gives some estimates.
3.1.3 What alternative estimation procedures might be appropriate?
If beta is time varying then OLS might still be used as discussed above. An alternative approach is to make parametric assumptions about the time evo- lution of either βi,t or of the underlying variance and covariances at say daily frequency. There is a long established literature in econometrics for modelling time variation of second moments of nancial time series through models of conditional heteroscedasticity. For estimating beta the implication would be that we specify (and estimate) some model that allows the (conditional) distri- bution of the random vector (Ri, RM ) to evolve over time. As time evolves the joint distribution then implies a new value for βi,t. We discuss these models of autoregressive conditional heteroscedasticity (ARCH) below in Section 4.
Least squares estimation of the CAPM model raises some issues
We start by revisiting univariate models with conditional heteroscedasticity.
It is relatively straightforward to test whether the residuals from a regression display time varying heteroscedasticity without actually having to estimate the ARCH specication. The squared residuals from an OLS regression are re- gressed on p of their own lagged values ie
e^2 t = a 0 + a 1 e^2 t− 1 + ... + ape^2 t−p + vt
and T times the R^2 from this regression is a χ^2 with p degrees of freedom under the null that εt is i.i.d. N (0, σ^2 ). The test is routinely implemented by most time series regression packages. This is the test reported above.
If ARCH eects are detected then estimation proceeds by maximum likelihood. If we assume the εt are normally distributed then the conditional distribution of yt is normal with mean α and variance ht. The log-likelihood can easily be derived and maximised numerically. One usually needs quite high frequency data (e.g. daily) to be able to identify ARCH eects well. Of course there is nothing to restrict us to Gaussian distributions. In fact even though ARCH models with Gaussian errors have fatter tails than the normal, many (particularly nancial) series that one might wish to consider seem to have even fatter tails. One common assumption is that εt has a t- distribution with ν degrees of freedom with ν a parameter that then enters the likelihood function and is estimated along with everything else - this will tend to generate even more kurtosis in the distribution of the ARCH variable and (the hope would be) a better t to the data. One could easily make further distributional assumptions.
A large number of extensions to the basic ARCH model have been proposed, and many can easily be implemented in standard regression packages.
Perhaps the most widely used extension is to the generalised form of ARCH given by the following equation
ht = a 0 + a 1 ε^2 t− 1 + a 2 ε^2 t− 2 + ... + apε^2 t−p + β 1 ht− 1 + β 2 ht− 2 + ... + βr ht−r
giving a GARCH(r, p) model. This allows for greater serial dependence in the variance term. Estimation is by maximum likelihood, though it is some- what complicated by the need to deal with presample values of the ht. Note GARCH(0, p) is ARCH(p). In many nancial applications the standard model
overwhelmingly used by practitioners is the GARCH(1,1) model which is seen to capture many of the features of high frequency returns. Much research has been devoted to models that (it is argued) better capture the heteroscedastic- ity in the data, for example the class of exponential-GARCH models capture asymmetries in the eect of shocks on the volatility. But for the purposes of modelling joint distributions the degree of tractability oered by plain GARCH models is attractive.
In our setting we wish to model both the return on asset i, Ri, and the market return RM. Both series display time varying heteroscedasticity so we specify a joint process as
RM,t = αM + uM,t Ri,t = αi + ui,t
so
E
RM,t Ri,t
| Ωt− 1
αM αi
V ar
RM,t Ri,t
| Ωt− 1
= V ar
uM,t ui,t
| Ωt− 1
σ 112 ,t σ 12 ,t σ 21 ,t σ^222 ,t
with a short run (conditional)β then given as
βi,t =
σ 12 ,t σ 112 ,t
We then model the joint distribution in (9) directly by specifying an ARCH or GARCH model for the second moments. Once we have the parameters of that distribution we can infer βi,t directly. This avoids the problems discussed above of estimating β from a regression where the time varying nature of β means both that there is unmodelled heteroscedasticity and that the least squares assump- tion of constant coecients is violated. Instead we model the heteroscedasticity in the variances and covariances directly and calculate β from the implied es- timates. By specifying GARCH models for σ^211 ,t, σ^222 ,t and σ 12 ,t we capture a structure where there can be short run (transitory) movements in βi,t around some longer run equilibrium value, that is a large shock to returns in the current period changes the conditional variance and covariance of returns in the next period. This causes β to change in the short run and also aects the probabil- ity distribution of shocks next period, and the realisation then transmits this heteroscedasticity forward. We observe clusters of increased volatility revert- ing to some longer run value before another shock sets o the process again. The sequence of β's, implied by this structure will display possibly persistent movements about some longer run average value. A variety of dierent specications have been proposed for the conditional variance process. Two practical diculties with these models are rst to ensure
If we are purely interested in the long run value a number of possible estimators suggest themselves. One is to use the estimated coecients directly
β^ ˆLR =
mˆ 21 /
1 − ˆa 11 ˆa 22 − ˆb 11 ˆb 22
m ˆ 11 /
1 − ˆa^211 − ˆb^211
An alternative is to use the estimated values ˆσ^2 M,t and ˆσiM,t. The time series average of each of these should converge to their long run values ie V ar (RM ) and Cov (Ri, RM ) respectively. So one can also estimate beta as
β^ ˆavs =
1 T
σˆiM,t 1 T
ˆσ M,t^2
Finally one can calculate short run conditional betas as
β^ ˆSR,t = ˆσiM,t ˆσ^2 M,t
One could then estimate the long run beta as a simple average of these short runs ie
β^ ˆSR =^1 T
t
β^ ˆSR,t =^1 T
t
ˆσiM,t/σˆ^2 M,t
This gives three methods to estimate beta from the GARCH -BEKK speci- cation. Firstly note that for βˆLR even if we have unbiased estimates of the coef- cients (m 11 , m 12 , m 22 , a 11 , a 22 , b 11 , b 22 ) we do not get unbiased estimates of Cov (Ri, RM ) and V ar (RM ) by plugging in these estimates. To see this note
E
m ˆ 21 /
1 − ˆa 11 ˆa 22 − ˆb 11 ˆb 22
6 =E ( ˆm 21 ) /
1 − E (ˆa 11 ) E (ˆa 22 ) − E
ˆb 11
ˆb 22
=m 21 / (1 − a 11 a 22 − b 11 b 22 ) = Cov (Ri, RM )
and similarly for the denominator. But we have
plim
βˆLR
= β
as long as we have consistent estimates of (m 11 , m 12 , m 22 , a 11 , a 22 , b 11 , b 22 ) and a^211 + b^211 6 = 1 and a 11 a 22 − b 11 b 22 6 = 1. If the GARCH specication implies stationary processes for the conditional moments σiM,t and σ^2 M,t then time series averages of the estimated processes will converge in probability to the unconditional values. So again we will have
plim
βˆavs
= β
though if we are fairly close to (integrated) I-GARCH this convergence may be quite slow Finally note also that for βˆSR,t since
ˆσiM,t σ ˆ^2 M,t
E (ˆσiM,t) E
σ ˆ^2 M,t
σiM σ M^2
= β
an average of the short run βˆSR will typically not be unbiased for β.
5 Simulations
To illustrate the eect of ARCH or GARCH models compared to a situation of IID shocks we simulate three models for returns. Firstly we make 3000 observations as . RET (^1) t = ut
where utis IID N
0 , σ^2
and σ^2 is chosen to match the daily volatility of ASX Secondly we specify RET (^2) t = ut
where ut ∼
htt with t ∼ IN (0, 1) and ht = 0.00008+0. 4 u^2 t− 1 is an ARCH(1) model with coecients chosen to match the estimated ARCH(1) values from ASX Finally we specify a GARCH(1,1) model
RET (^3) t = ut
where ut ∼
htt with t ∼ IN (0, 1) and ht = 0 .0000017 + 0. 87 ht− 1 +
.
.
.
.
.
.
500 1000 1500 2000 2500 3000
RET1^
.
.
.
.
.
.
500 1000 1500 2000 2500 3000
RET2^
.
.
.
.
.
.
500 1000 1500 2000 2500 3000
RET3^
Figure 5: Squared returns for IID, ARCH(1) and GARCH(1,1) processes
.
.
.
.
.
.
.
500 1000 1500 2000 2500 3000 3500
RET^
Figure 6: Squared returns for simulated BEKK-GARCH process
Again we see that the BEKK-GARCH model is capable of generating the clusters of volatility seen in the actual data. Using our 4000 simulated returns we mimic the current procedure by esti- mating a rolling regression using a window of 500 observations of the simulated stock return on the simulated market return. This gives 3500 estimates of (daily) β. These are graphed below
500 1000 1500 2000 2500 3000 3500
BETA
Figure 7: Rolling OLS β estimates from joint BEKK-GARCH process (500 obs window)
The average of the rolling β estimates is 0.61 so in this simulation this average is about 20% too high. It is important to see what this means. In this simulation the true long run value of β is 0.5. In a least squares regression on a randomly chosen window of 500 observations one can expect to see an estimated β of about 0.6. The estimated βs from the rolling regressions are also distributed quite widely around the true value with values as large as 0.9 and as small as 0.3. If one were interested in the short run conditional β (say for portfolio allocation decisions) then the rolling regression may provide a reasonable estimate of the recent average value, but (except by chance) the estimate from any particular regression (or even the average over all regressions) is not a good guide to the true long run value of β (in this case 0.5). To get a better picture of the eect of the rolling regression approach we repeat this simulation exercise 2000 times. Figure 8 (below) then shows the histogram of the averaged rolling βs from these repetitions (ie in each repetition we estimate 3500 rolling OLS regressions and take the average beta and graph the histogram of the resulting 2000 averages). Relative to the true long run parameter the entire distribution is shifted to the right so both mean and median overstate the true value.
0
50
100
150
200
250
300
0.3 0.4 0.5 0.6 0.7 0.
Figure 8: Histogram of average rolling β estimates (2000 repetitions)
Each repetition of 3500 rolling OLS with window 500 from simulated multi-GARCH model. True long run β = 0. 5.
The results presented are for a single simulation model. Unfortunately there are too many dimensions to do an encompassing simulation (that is to vary the original estimation length and parameter values, the simulated data series length, the rolling OLS window length). We have however repeated this exercise using various dierent sample periods to estimate the coecients in Table 1, and these are then used to simulate dierent lengths of articial returns data and