[PDF] Basic Financial Econometrics




Loading...







[PDF] Basic Financial Econometrics

We start by reviewing key aspects of regression analysis Its purpose is to relate a depen- dent variable y to one or more variables X which are assumed to 

Market Microstructure, Factor Models and Financial Risk Measures

FINANCIAL ECONOMETRICS MODELING: Derivatives Pricing, Hedge Funds and Term Structure Models NONLINEAR FINANCIAL ECONOMETRICS: Forecasting Models,

[PDF] What we have learnt from financial econometrics modeling?

13 jui 2018 · A central issue around which the recent growth literature has evolved is that of financial econometrics modeling Expansions of interest in 

[PDF] Applied Financial Econometrics Slides

7 Explaining returns and estimating factor models Applied Financial Econometrics — General Information — U Regensburg — July 2012 SS2010 pdf

[PDF] Financial Econometrics: Methods and models

time-varying volatility models of the GARCH type and the stochastic volatility in time series analysis, economic modelling, financial studies or policy 

[PDF] Financial Econometrics

It contains brief overviews of econometric concepts, models and data analysis techniques followed by empirical examples of how they can be implemented in EViews 

[PDF] The Basics of Financial Econometrics

The Exchange-Traded Funds Manual by Gary L Gastineau The Mathematics of Financial Modeling and Investment Management by Sergio M Focardi and Frank J

[PDF] Econometric modelling in finance and insurance with the R language

Arthur CHARPENTIER - Econometric modelling in finance and insurance with Remark : it is possible to catch the output into a file, ( pdf , png, jpeg, etc)

[PDF] PDF - Econometric Modelling for Global Asset Liability Management

This is known in literature generally as strategic financial planning (Dempster Asset Return and Economic Growth Rate Econometric Model Specification

[PDF] Basic Financial Econometrics 105162_2Basic_Financial_Econometrics.pdf

Basic Financial Econometrics

Alois GeyerVienna University of Economics and Business alois.geyer@wu.ac.at http://www.wu.ac.at/~geyer this version:

June 24, 2021

preliminary and incompletec

Alois Geyer 2021 { Some rights reserved.

This document is subject to the following Creative-Commons-License: http://creativecommons.org/licenses/by-nc-nd/2.0/at/deed.enUS

Contents

1 Financial Regression Analysis

1

1.1 Regression analysis

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Least squares estimation

. . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.2 Implications

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.1.3 Interpretation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Finite sample properties of least squares estimates

. . . . . . . . . . . . . . 6

1.2.1 Assumptions

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.2.2 Properties

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.2.3 Testing hypothesis

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.2.4 Example 6: CAPM, beta-factors and multi-factor models

. . . . . . 15

1.2.5 Example 7: Interest rate parity

. . . . . . . . . . . . . . . . . . . . . 19

1.2.6 Prediction

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.3 Large sample properties of least squares estimates

. . . . . . . . . . . . . . 22

1.3.1 Consistency

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1.3.2 Asymptotic normality

. . . . . . . . . . . . . . . . . . . . . . . . . . 25

1.3.3 Time series data

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

1.4 Maximum likelihood estimation

. . . . . . . . . . . . . . . . . . . . . . . . . 28

1.5 LM, LR and Wald tests

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

1.6 Speci cations

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

1.6.1 Log and other transformations

. . . . . . . . . . . . . . . . . . . . . 33

1.6.2 Dummy variables

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

1.6.3 Interactions

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

1.6.4 Di erence-in-di erences

. . . . . . . . . . . . . . . . . . . . . . . . . 36

1.6.5 Example 11: Hedonic price functions

. . . . . . . . . . . . . . . . . . 37

1.6.6 Example 12: House price changes induced by siting decisions

. . . . 38

1.6.7 Omitted and irrelevant regressors

. . . . . . . . . . . . . . . . . . . . 39

1.6.8 Selection of regressors

. . . . . . . . . . . . . . . . . . . . . . . . . . 41

1.7 Regression diagnostics

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

1.7.1 Non-normality

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

1.7.2 Heteroscedasticity

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

1.7.3 Autocorrelation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

1.8 Generalized least squares

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

1.8.1 Heteroscedasticity

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

1.8.2 Autocorrelation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

1.8.3 Example 19: Long-horizon return regressions

. . . . . . . . . . . . . 55

1.9 Endogeneity and instrumental variable estimation

. . . . . . . . . . . . . . . 57

1.9.1 Endogeneity

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

1.9.2 Instrumental variable estimation

. . . . . . . . . . . . . . . . . . . . 59

1.9.3 Selection of instruments and tests. . . . . . . . . . . . . . . . . . . 62

1.9.4 Example 21: Consumption based asset pricing

. . . . . . . . . . . . 65

1.10 Generalized method of moments

. . . . . . . . . . . . . . . . . . . . . . . . 69

1.10.1 OLS, IV and GMM

. . . . . . . . . . . . . . . . . . . . . . . . . . . 71

1.10.2 Asset pricing and GMM

. . . . . . . . . . . . . . . . . . . . . . . . . 72

1.10.3 Estimation and inference

. . . . . . . . . . . . . . . . . . . . . . . . 74

1.10.4 Example 24: Models for the short-term interest rate

. . . . . . . . . 77

1.11 Models with binary dependent variables

. . . . . . . . . . . . . . . . . . . . 78

1.12 Sample selection

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

1.13 Duration models

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

2 Time Series Analysis

87

2.1 Financial time series

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

2.1.1 Descriptive statistics of returns

. . . . . . . . . . . . . . . . . . . . . 88

2.1.2 Return distributions

. . . . . . . . . . . . . . . . . . . . . . . . . . . 91

2.1.3 Abnormal returns and event studies

. . . . . . . . . . . . . . . . . . 94

2.1.4 Autocorrelation analysis of nancial returns

. . . . . . . . . . . . . . 97

2.1.5 Stochastic process terminology

. . . . . . . . . . . . . . . . . . . . . 100

2.2 ARMA models

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

2.2.1 AR models

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

2.2.2 MA models

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

2.2.3 ARMA models

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

2.2.4 Estimating ARMA models

. . . . . . . . . . . . . . . . . . . . . . . . 106

2.2.5 Diagnostic checking of ARMA models

. . . . . . . . . . . . . . . . . 107

2.2.6 Example 35: ARMA models for FTSE and AMEX returns

. . . . . 108

2.2.7 Forecasting with ARMA models

. . . . . . . . . . . . . . . . . . . . 110

2.2.8 Properties of ARMA forecast errors

. . . . . . . . . . . . . . . . . . 112

2.3 Non-stationary models

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

2.3.1 Random-walk and ARIMA models

. . . . . . . . . . . . . . . . . . . 115

2.3.2 Forecasting prices from returns

. . . . . . . . . . . . . . . . . . . . . 118

2.3.3 Unit-root tests

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

2.4 Di usion models in discrete time

. . . . . . . . . . . . . . . . . . . . . . . . 124

2.4.1 Discrete time approximation

. . . . . . . . . . . . . . . . . . . . . . 126

2.4.2 Estimating parameters

. . . . . . . . . . . . . . . . . . . . . . . . . . 126

2.4.3 Probability statements about future prices

. . . . . . . . . . . . . . . 129

2.5 GARCH models

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

2.5.1 Estimating and diagnostic checking of GARCH models

. . . . . . . . 133

2.5.2 Example 49: ARMA-GARCH models for IBM and FTSE returns

. . 133

2.5.3 Forecasting with GARCH models

. . . . . . . . . . . . . . . . . . . . 135

2.5.4 Special GARCH models

. . . . . . . . . . . . . . . . . . . . . . . . . 136

3 Vector time series models138

3.1 Vector-autoregressive models

. . . . . . . . . . . . . . . . . . . . . . . . . . 138

3.1.1 Formulation of VAR models

. . . . . . . . . . . . . . . . . . . . . . . 138

3.1.2 Estimating and forecasting VAR models

. . . . . . . . . . . . . . . . 140

3.2 Cointegration and error correction models

. . . . . . . . . . . . . . . . . . . 143

3.2.1 Cointegration

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

3.2.2 Error correction model

. . . . . . . . . . . . . . . . . . . . . . . . . . 143

3.2.3 Example 53: The expectation hypothesis of the term structure

. . . 145

3.2.4 The Engle-Granger procedure

. . . . . . . . . . . . . . . . . . . . . . 146

3.2.5 The Johansen procedure

. . . . . . . . . . . . . . . . . . . . . . . . . 150

3.2.6 Cointegration among more than two series

. . . . . . . . . . . . . . . 155

3.3 State space modeling and the Kalman lter

. . . . . . . . . . . . . . . . . . 157

3.3.1 The state space formulation

. . . . . . . . . . . . . . . . . . . . . . . 157

3.3.2 The Kalman lter

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

3.3.3 Example 60: The Cox-Ingersoll-Ross model of the term structure

. . 159

Bibliography

162
I am grateful to many PhD students of the VGSF program, as well as doctoral and master students at WU for valuable comments which have helped to improve these lecture notes.

1.1 Regression analysis1

1 Financial Regression Analysis

1.1 Regression analysis

We start by reviewing key aspects of regression analysis. Its purpose is to relate adepen- dent variableyto one or more variablesXwhich are assumed to a ecty. The relation is speci ed in terms of a systematic part which determines theexpected valueofyand a random part. For example, the systematic part could be a (theoretically derived) valuation relationship. The random part represents unsystematic deviations between ob- servations and expectations (e.g. deviations from equilibrium). The relation betweeny andXdepends onunknownparameters which are used in the function that relatesX to the expectation ofy. Assumption AL(linearity): We consider thelinearregression equation y=X +: yis then1 vector (y1;:::;yn)0of observations of the dependent (orendogenous) vari- able,is the vector oferrors(also calledresiduals,disturbances,innovationsor shocks), is theK1 vector of parameters, and thenKmatrixXofregressors(also calledexplanatory variablesorcovariates) is de ned as follows: X=0 B

BBB@1x11x12x1k

1x21x22x2k...............

1xn1xn2xnk1

C CCCA: kis the number of regressors andK=k+1 is the dimension of =( 0; 1;:::; k)0, where

0is theconstant termorintercept. A single rowiofXwill be denoted by theK1

columnvectorxi. For asingleobservation the model equation is written as y i=x0i +i(i= 1;:::;n): We will frequently (mainly in the context of model speci cation and interpretation) use formulations like y= 0+ 1x1++ kxk+; where the symbolsy,xiandrepresentthe variables in question. It is understood that such equations also hold for a single observation.

1.1 Regression analysis2

1.1.1 Least squares estimation

A main purpose of regression analysis is to draw conclusions about the population using a sample. The regression equationy=X +is assumed to hold in the population. The sample estimate of is denoted byband the estimate ofbye. According to the least squares (LS) criterion,bshould be chosen such that the sum of squared errors SSE is minimized

SSE(b) =nX

i=1e2i=nX i=1(yix0ib)2= (yXb)0(yXb)!min: A necessary condition for a minimum is derived from

SSE(b) =y0y2b0X0y+b0X0Xb;

and is given by @SSE(b)@b=0:2X0y+ 2X0Xb=0: Assumption AR(rank): We assume thatXhas full rank equal toK(i.e. the columns ofXare linearly independent). IfXhas full rank,X0Xis positive de nite and the ordinary least squares (OLS)estimatesbare given by b= (X0X)1X0y:(1)

The solution is a minimum since

@

2SSE(b)@b2= 2X0X

is positive de nite by assumptionAR. It is useful to expressX0yandX0Xin terms of the sums X 0y=nX i=1x iyiX0X=nX i=1x ix0i to point out that the estimate is related to the covariance between the dependent variable and the regressors, and the covariance among regressors. In the special case of the simple regression modely=b0+b1x+ewith a single regressor the estimatesb1andb0are given by b

1=syxs

2x=ryxsys

xb0= yb1x; wheresyx(ryx) is the sample covariance (correlation) betweenyandx,syandsxare the sample standard deviations ofyandx, and yand xare their sample means.

1.1 Regression analysis3

1.1.2 Implications

By the rst order condition the OLS estimates satisfy thenormal equation (X0X)bX0y=X0(yXb) =X0e=0;(2) which implies that each column ofXis uncorrelated with (orthogonal to)e. If the rst column ofXis a column of ones denoted by, LS estimation has the following implications: 1. The residuals ha vezero mean since 0e=0 (from the normal equation). 2. This implies that the mean of the tted values^yi=x0ibis equal to the sample mean: 1n n X i=1^yi= y: 3. The tted v aluesare equal to the mean of yif the regression equation is evaluated for the means ofX: y=b0+kX j=1xjbj: 4. The tted v aluesand the residuals are orth ogonal: ^y0e=0: 5. The slop ein a regression of yoneis always equal to one and the constant is equal to y.1 The goodness of t of a regression model can be measured by thecoecient of deter- minationR2de ned as R

2= 1e0e(yy)0(yy)= 1(y^y)0(y^y)(yy)0(yy)=(^yy)0(^yy)(yy)0(yy):

This is the so-calledcentered versionofR2which lies between 0 and 1 if the model contains an intercept. It is equal to the squared correlation betweenyand^y. The three terms in the expression (yy)0(yy) = (^yy)0(^yy) + (y^y)0(y^y) are called the total sum of squares (SST), the sum of squares from the regression (SSR), and the sum of squared errors (SSE). Based on this relationR2is frequently interpreted as1 By implication 3 the constant must be equal to ysince the mean ofeis zero. The slope is given by

(e0e)1e0~y, where~y=yy. The slope is equal to one sincee0~y=e0e. The latter identity holds since in the

original regressione0y=e0Xb+e0eande0X=00. Finally,e0y=e0~ysincee0y=0.

1.1 Regression analysis4

the percentage ofy's variance 'explained' by the regression. If the model does not contain an intercept, the centeredR2may become negative. In that case theuncenteredR2can be used: uncenteredR2= 1e0ey

0y=^y0^yy

0y: R

2is zero if all regression coecients except for the constant are zero (b=(b00)0and

^y=b0=y). In this case the regression is a horizontal line. IfR2=1 all observations are located on the regression line (or hyperplane) (i.e. ^yi=yi).R2is (only) a measure for the goodness of thelinear approximationimplied by the regression. Many other, more relevant aspects of a model's quality, are not taken into account byR2. Such aspects will become more apparent as we proceed.

1.1.3 Interpretation

The coecientsbcan be interpreted on the basis of the tted values2 ^y=b0+x1b1++xkbk: b jis the change in ^y(or, theexpectedchange iny) ifxjchanges by one unit ceteris paribus (c.p.), i.e. holdingall otherregressors xed. In general the change in the expected value is ^y= x1b1++ xkbk; which implies that the e ects of simultaneously changing several regressors can be added up. This interpretation is based on theFrisch-Waugh theorem. Suppose we partition the regressors in two groupsX1andX2, and regressyonX1to save the residualse1. Next we regress each column ofX2onX1and save the residuals of these regressions in the matrix E

2. According to the Frisch-Waugh theorem the coecients from the regression ofe1on

E

2areequalto the subset of coecients from the regression ofyonXthat corresponds to

X

2. In more general terms, the theorem implies thatpartiale ects can be obtaineddirectly

from a multiple regression. It is not necessary to rst construct orthogonal variables. To illustrate the theorem we consider the regression y=b0+b1x1+b2x2+e: To obtain the coecient ofx2such that the e ect ofx1(and the intercept) is held constant, we rst run the two simple regressions y=cy+by1x1+ey1x2=cx2+b21x1+e21: e y1ande21represent those parts ofyandx2which do not depend onx1. Subsequently, we run a regression using these residuals to obtain the coecientb2: (ycyby1x1) =b2(x2cx2b21x1) +u ey1=b2e21+u:2 An analogous interpretation holds for in the population.

1.1 Regression analysis5

In general, this procedure is also referred to as 'controlling for' or 'partialling out' the e ect ofX1. Simply speaking, if we want to isolate the e ects ofX2onywe have to 'remove' the e ects ofX1from the entire regression equation.3However, according to the Frisch-Waugh theorem it is not necessary to run this sequence of regressions in practice. Running a (multiple) regression ofyon all regressorsX'automatically' controls for the e ects of each regressor on all other regressors. A special case is an orthogonal regression, where all regressors are uncorrelated (i.e.X0Xis a diagonal matrix). In this case the coecients from the multiple regression are identical to those obtained fromKsimple regressions using one column ofXat a time. Example 1:We use the real investment data from Table 3.1 inGreene ( 2003) to estimate a multiple regression model. The dependent variable is real investment (in trillion US$; denoted byy). The explanatory variables are real GNP (in trillion

US$;g), the (nominal) interest raterand the in

ation ratei(both measured as percentages). The (rounded) estimated coecients are b= (0:0726 0:2360:003560:000276)0; where the rst element is the constant term. The coecient0:00356 can be inter- preted as follows: if the interest rate goes up by one percentage point and the other regressors do not change, real investment is expected to drop by about 3:56 billion US$. SST=0.0164, SSR=0.0127 and SSE=0.00364. The correspondingR2equals 0.78 (SSR/SST), which means that about 78% of the variance in real investment can be explained by the regressors. Further details can be found in the leinvestment.xls. Exercise 1:Use the quarterly data in Table F5.1 from Greene's website http://pages.stern.nyu.edu/ ~wgreene/Text/tables/tablelist5.htm (see leTable F5.1.xls) to estimate a regression of real investment (real- invs) on a constant, real GDP, the nominal interest rate (tbilrate; 90 day treasury bill rate) and the in ation rate (in ). Check the validity of the ve

OLS implications mentioned on p.

3 . Apply the Frisch-Waugh theorem and show how the coecients of the constant term and tbilrate can be obtained by controlling for the e ects of the nominal interest rate and in ation.3 As a matter of fact, the e ects ofX1, or any other set of regressors we want to control for,need not be removed fromy. It can be shown that the coecients associated withX2can also be obtained from a regression ofyonE2. Because of implication 5 the covariance betweene1and the columns ofE2is identical to the covariance betweenyandE2.

1.2 Finite sample properties of least squares estimates6

1.2 Finite sample properties of least squares estimates

4 Review 1:For any constantsaandband random variablesYandXthe following relations hold: E[a+Y] =a+ E[Y] E[aY] =aE[Y] V[a+Y] = V[Y] V[aY] =a2V[Y]: E[aX+bY] =aE[X] +bE[Y] V[aX+bY] =a2V[X] +b2V[Y] + 2abcov[XY]: Jensen's inequality: E[f(X)]f(E[X]) for any convex functionf(X). For a constantaand random variablesW;X;Y;Zthe following relations hold: ifY=aZ: cov[X;Y] =acov[X;Z]: ifY=W+Z: cov[X;Y] = cov[X;W] + cov[X;Z]: cov[X;Y] = E[XY]E[X]E[Y] cov[Y;a] = 0: IfXis an1 vector of random variables V[X]=cov[X]==E[(XE[X])(XE[X])0] is annmatrix. Its diagonal elements are the variances of the elements ofX. Using =E[X] we can write=E[XX0]0. Ifbis an1 vector andAis annmatrix of constants, the following relations hold:

E[b0X] =b0V[b0X] =b0bE[AX] =AV[AX] =AA0:

Review 2:The conditional and unconditional moments of two random variablesY andXare related as follows:

Law of iterated expectations:

5E[Y] = Ex[E[YjX]]

Functions of the conditioning variable:

6E[f(X)YjX] =f(X)E[YjX]

If E[YjX] is a linear function ofX: E[YjX] = E[Y] +cov[Y;X]V[X](XE[X]) Variance decomposition: V[Y] = Ex[V[YjX]] + Vx[E[YjX]] Conditional variance: V[YjX] = E[(YE[YjX])2jX] = E[Y2jX](E[YjX])2.

Review 3:

7A set ofnobservationsyi(i=1,...,n) of a random variableYis arandom

sampleif the observations are drawnindependentlyfrom thesamepopulation with probability densityf(yi;). A random sample is said to beindependent,identically distributed(i.i.d.) which is denoted byyii.i.d. Across sectionis a sample of several units (e.g. rms or households) observed at a speci c point in time (or time interval). Atime seriesis a chronologically ordered se- quence of data usually observed at regular time intervals (e.g. days or months).Panel datais constructed by stacking time series of several cross sections (e.g. monthly con- sumption and income of several households). We consider a parameterand itsestimator^derived from a random sample of size n. Estimators are rules for calculating estimates from a sample. For simplicity^both4

Most of this section is based on

Greene

( 2003
), sections 2.3 and 4.3 to 4.7, and

Ha yashi

( 2000
), sections

1.1 and 1.3.

5E[YjX] is a function ofX. The notation Exindicates expectation over values ofX.

6See equation 7-60 inP apoulis( 1984, p.165).

7Greene( 2003); sections C.1 to C.5.

1.2 Finite sample properties of least squares estimates7

denotes the estimated value from a speci c sample and the estimator (the function used to derive the estimate). ^is a random variable since it depends on the (random) sample. Thesampling distributiondescribes the probability distribution of^across possible samples.

Unbiasedness:

^isunbiased, if E[^]=. The expectation is formed with respect to the sampling distribution of ^. Thebiasis E[^]. Examples: The sample mean and the sample median are unbiased estimators.

The unadjusted sample variance

~s2=1n n X i=1(yiy)2 is a biased estimator of2, whereass2=n~s2=(n1) is unbiased. Mean squared error:The mean squared error (MSE) of^is the sum of the variance and the squared bias: MSE[ ^] = E[(^)2] = V[^] + (E[(^)])2: Example: The MSE of the unbiased estimators2is larger than the MSE of ~s2.

Eciency:

^isecientif it is unbiased, and its sampling variance is lower than the variance of any other

8to another estimator unbiased estimator^0:

V[ ^]1.2 Finite sample properties of least squares estimates8

1.2.1 Assumptions

The sample estimatesbandecan be used to draw conclusions about the population. An important question relates to the nite sample properties of the OLS estimates. Exact (or nite sample) inference as opposed to asymptotic (large sample) inference is valid for any sample sizenand is based on further assumptions (in addition toALandAR) mentioned and discussed below. To derive the nite sample properties of the OLS estimate we rewritebin (1) as follows: b= (X0X)1X0(X +) = + (X0X)1X0= +H:(3) We consider the statistical properties ofb(in particular E[b], V[b], and its distribution). This is equivalent to investigate thesampling errorb . From (see Review2 )

E[b] = + Eh

(X0X)1X0i = + (X0X)1EX0(4) we see that the properties ofbdepend on the properties ofX,, and their relation. In the so-calledclassical regression model,Xis assumed to be non-stochastic. This means thatXcan be chosen (like in an experimental situation), or is xed in repeated samples. Neither case holds in typical nancial empirical studies. We will treatXas random, and the nite sample properties derived below are considered to beconditionalon the sample X(although we will not always indicate this explicitly). This does not preclude the possibility thatXcontains constants (e.g. dummy variables). The important requirement (assumption) is thatXandare generated by mechanisms that are completely unrelated. Assumption AX(strict exogeneity): The conditional expectation ofeachiconditional onallobservations and variables inXis zero:

E[jX] =0E[ijx1;:::;xn] = 0 (i= 1;:::;n):

According to this assumption,Xcannot be used to obtain information about. If

AXis satis ed, the following properties hold:

1. (unconditional mean): E [E[jX]]=E[]=0. 2. (conditional exp ectation):E[ yjX]=^y=X . 3.

Regressors an ddisturbances are orthogonal

E[xilj] = 0 (i,j=1;:::;n;l=1;:::;K);

since E[xilj]=E[E[xiljjxil]]=E[xilE[jjxil]]=0. This implies that regressors are orthogonal to the disturbances from the sameandall other observations. Or- thogonality with respect to thesameobservations is expressed by

E[X0] =0:

Orthogonality is equivalent to zero correlation betweenXand: cov[X;] = E[X0]E[X]E[] =0:

1.2 Finite sample properties of least squares estimates9

Note that this orthogonality must not be confused with orthogonality between Xand the residualsefrom LS estimation (see section1.1.2 ). There it is a consequence of choosingbsuch the sum of squared errors is minimized. Here it is an assumption that refers to the unknown.

4.yii=x0ii +2i)E[yii]=E[2i] = V[i]:

IfAXholds, the explanatory variables are (strictly)exogenous. The termendo- geneity(i.e. one or all explanatory variables are endogenous) is used ifAXdoes not hold (broadly speaking, ifXandare correlated). Note that sometimes, instead of assumingAXto hold, the assumptions E[]=0or E[X0]=0are made instead. For example,AXis violated when a regressor,in fact, is determined on the basis of the dependent variabley. This is the case in any situation whereyandX (at least one of its columns) are determined simultaneously. A classic example are regressions attempting to analyze the e ect of the number of policemen on the crime rate. These are bound to fail whenever the police force is driven by the number of crimes committed. Solutions to this kind of problem are discussed in section

1.9.1

. Another example are regressions relating the performance of funds to their size. It is conceivable that an unobserved variable like the skill of fund managers a ects size and performance. If that is the case,AXis violated. Another important case whereAXdoes not hold is a model where thelagged dependent variableis used as a regressor: y t=yt1+x0t +tyt+1=yt+x0t+1 +t+1yt+2=:::: AXrequires the disturbancetto be uncorrelated with regressors from any other ob- servation, e.g. withytfrom the equation fort+1.AXis violated because E[ytt]6=0. There are two main reasons for addingyt1to a regression: (a) to account for autocorrelated residuals (see section

1.7.3

), and (b) to account for potentially missing regressors (see section

1.6.7

for a detailed treatmen tof the omitted v ariablebias). The e ect of omitted regressors is captured bytwhich a ectsyt. In a time series context one can assume (or hope) thatyt1partly re ects that missing information, in particular with rather frequently observed data. Hence, we are faced with a situation where the bias from adding the lagged dependent variable may be accepted to avoid the bias from omitted regressors. 9 Predictive regressionsare obtained when a predictorxtenters only with a lag: y t= 0+ 1xt1+t: For dependent variables like asset returns (i.e.yt=lnpt=pt1) a typically used pre- dictor is the dividend-price ratio (i.e.xt=lndt=pt1.S tambaugh( 1999) argues that, despite E[tjxt1]=0, in a predictive regression E[tjxt]6= 0, and thusAXis violated.

To understand this reasoning, we consider

lnptlnpt1|{z} y t= 1(lndt1lnpt1|{z} x t1) +t;9

As shown below, (a) the e ects of adding a lagged dependent variable depend on the resulting residual

autocorrelation, and (b) omitted regressors lead to biased andinconsistentcoecients.

1.2 Finite sample properties of least squares estimates10

lnpt+1lnpt|{z} y t+1= 1(lndtlnpt|{z} x t) +t+1:::; where 0=0 for simplicity. Disturbancesta ect the price int, (and, for given p t1, the return during the periodt1 tot). Thus, they are correlated withpt, and hence with the regressor in the equation fort+1. Although the mechanism appears similar to the case of a lagged dependent variable, here the correlation between the disturbances and very speci cally de ned predictorsxtis the source of violation of AX.St ambaugh( 1999) shows that this leads to a nite-sample bias (see below) in the estimated parameterb1,irrespectiveof 1(e.g. even if 1=0). Assumption AH(homoscedasticity; uncorrelatedness): This assumptions covers two aspects. It states that the (conditional) variance of the disturbances is constant across observations (assuming thatAXholds):

V[ijX] = E[2ijX](E[ijX])2= E[2ijX] =28i:

The errors are said to beheteroscedasticif their variance is not constant. The second aspect ofAHrelates to the (conditional) covariance ofwhich is as- sumed to be zero: cov[i;jjX] = 08i6=jE[0jX] = V[jX] =2I: This aspect ofAHimplies that the errors from di erent observations are not corre- lated. In a time series context this correlation is calledserialorautocorrelation. Assumption AN(normality): AssumptionsAXandAHimply that the mean and variance ofjXare0and2I. Adding the assumption of normality we have jXN(0;2I): SinceXplays no role in the distribution of, we haveN(0;2I). This assump- tion is useful to construct test statistics (see section

1.2.3

), although many of the subsequent results do not require normality.

1.2 Finite sample properties of least squares estimates11

1.2.2 Properties

Expected value ofb(AL,AR,AX): We rst take the conditional expectation of (3)

E[bjX] = + E[HjX]H= (X0X)1X0:

SinceHis a function of the conditioning variableX(see Review2 ), it follows that

E[bjX] = +HE[jX];

and by assumptionAX(E[jX]=0) we nd thatbis unbiased:

E[bjX] = :

By using the law of iterated expectations we can also derive the following uncondi- tional result

10(again usingAX):

E[b] = Ex[E[bjX]] = + Ex[HE[jX]] = :

We note that assumptionsAHandANare not required for unbiasedness, whereas AXis critical. Since a model with a lagged dependent variable violatesAX, all coecients in such a regression will be biased. Covariance ofb(AL,AR,AX,AH): The covariance ofbconditional onXis given by

V[bjX] = E[(b )(b )0jX]

= E[H0H0jX] =HE[0jX]H0 =H(2I)H0=2HH0 =2(X0X)1sinceHH0= (X0X)1X0X(X0X)1:(5) For the special case of a single regressor the variance ofb1is given by

V[b1] =2n

X i=1(xix)2=2(n1)2x;(6) which shows that the precision of the estimate increases with the sample size and the variance of the regressor2x, and decreases with the variance of the disturbances. To derive the unconditional covariance ofbwe use the variance decomposition

E[V[bjX]] = V[b]V[E[bjX]]:10

To verify thatbis unbiased conditionallyandunconditionally by simulation one could generate sam- ples ofy=X +for xedXusing many realizations of. The average over the OLS estimatesbjX{

corresponding to E[bjX] { should be equal to . However, ifXis also allowed to vary across samples the

average overb{ corresponding to the unconditional mean E[b]=E[E[bjX]] { should also equal .

1.2 Finite sample properties of least squares estimates12

Since E[bjX]= the second term is zero and

V[b] = E[2(X0X)1] =2E[(X0X)1];

which implies that the unconditional covariance ofbdepends on the population covariance of the regressors. Variance ofe(AL,AR,AX,AH): The variance ofbis expressed in terms of2(the population variance of). To estimate the covariance ofbfrom a sample we replace 

2by the unbiased estimator

s

2e=e0enKE[s2e] =2:

Its square rootseis thestandard error of regression.seis measured in the same units asy. It may be a more informative measure for the goodness of t thanR2, which is expressed in terms of variances (measured insquaredunits ofy). The estimatedstandard errorofbdenoted by se[b] is the square root of the diagonal of ^

V[bjX] =s2e(X0X)1:

Eciency(AL,AR,AX,AH): TheGauss-Markov Theoremstates that the OLS es- timatorbis not only unbiased but has the minimum variance of alllinearunbiased estimators (BLUE) and is thus ecient. This result holds whetherXis stochastic or not. IfANholds (the disturbances are normal)bhas the minimum variance of allunbiased (linear or not) estimators (seeGreene ( 2003), p.47,48). Sampling distribution ofb(AL,AR,AX,AH,AN): Given (3) andANthe distribu- tion ofbis normal for givenX: bjXN( ;2(X0X)1): The sample covariance ofbis obtained by replacing2withs2e, and is given by^V[b] de ned above. Example 2:The standard error of regression from example1 is 18.2 billion US$. Thi s can be compared to the standard deviation of real investment which amounts to 34 billion US$.seis used to compute the (estimated) standard errors for the estimated coecients which are given by se[b]=(0:0503 0:0515 0:00328 0:00365)0: Further details can be found in the lesinvestment.Rorinvestment.xls.

1.2 Finite sample properties of least squares estimates13

1.2.3 Testing hypothesis

Review 4:Anull hypothesisH0formulates a restriction with respect to an un- known parameter of the population=0. In atwo-sided testthe alternative hy- pothesis H

ais6=0. The test procedure is a rule that rejects H0if the sample estimate^is 'too far away' from0. This rule can be based on the 1 con dence interval^Q( =2)se[^], whereQ( ) denotes the -quantile of the sampling distribution of^.

H

0is rejected if0is outside the con dence interval.

IfYN(;2) andZ=(y)/thenZN(0;1). (Z)=P[Yy]=((y)/) is the standard normal distribution function (e.g. (1:96)=0.025).z is the -quantile of the standard normal distribution, such that P[Zz ]= (e.g.z0:025=1:96). Example 3:Consider a sample ofnobservations from a normal population with meanand standard deviation. The sampling distribution of the sample mean yis also normal. The standard error of the mean is=pn. The 1 con dence interval for the unknown meanis yz =2=pn. The estimatedstandard error of the mean se[y]=s=pnis obtained by replacing with the sample estimates. In this case the 1 con dence interval is given by yT( =2,n1)s=pnwhereT( ,n1) denotes the -quantile of thet-distribution (e.g.T(0:025;20)=2:086). Ifnis large the standard normal andt-quantiles are practically equal. In that case the interval is given by yz =2s=pn. Atype I erroris committed if H0is rejected although it is true. The probability of a type I error is thesigni cance level(orsize) . If H0is rejected,^is said to be signi cantly di erentfrom0at a level of . A type II error is committed if H0is not rejected although it is false. Thepowerof a test is the probability of correctly rejecting a false null hypothesis. The power depends on the true parameter (which is usually unknown). Atest statisticis based on a sample estimate^and0. It is a random variable. The distribution of the test statistic (usually under H

0) can be used to specify a rule

for rejecting H

0. H0is rejected if the test statistic exceedscritical valueswhich

depend on (and other parameters). In a two-sided test the critical values are the =2-quantiles and 1 =2-quantiles of the distribution. In a one-sided test of the form H

00(and Ha<0) the critical value is the -quantile (this implies that H0is rejected

if ^is 'far below'0). If H00the critical value is the 1 quantile. Thep-valueis that level of for which there is indi erence between accepting or rejecting H0. Example 4:We consider a hypothesis about the mean of a population. =0can be tested against6=0using thet-statistic (ort-ratio)t=(y0)/se[y]. thas a standard normal ort-distribution depending on whetheror sis used to compute se[y]. Ifsis used, thet-statistic is compared to T( =2,n1) in a two-sided test. One-sided tests useT( ,n1). In a two-sided test, H

0is rejected ifjtj>jT( =2;n1)j.

Ifis normally distributed thet-statistic

t i=bi ise[bi] has at-distribution withnKdegrees of freedom (df). se[bi] (the standard error ofbi) is the square root of thei-th diagonal element of^V[b]).tican be used to test hypotheses about single elements of .

1.2 Finite sample properties of least squares estimates14

A joint test of j=0 (j=1,...,k) can be based on the statistic

F=(nK)R2k(1R2);

which has anF-distribution with df=(k,nK) if the disturbances are normal. Example 5:Thet-statistics for the estimated coecients from example1 are giv en by (1:44 4:591:080:0755)0. As it turns out only the coecient of real GNP is signi cantly di erent from zero at a level of =5%. TheF-statistic is 12.8 with a p-value<0.001. Thus, we reject the hypothesis that the coecients are jointly equal to zero. Further details can be found in the leinvestment.xls. Exercise 2:Use the results from exercise1 and test the estimated co ecients for individual and joint signi cance. In general, hypothesis tests about can be based on imposing a linear restrictionr(a K1 vector consisting of zeros andones) on andb, and compare=r0 tod=r0b. Ifddi ers signi cantly fromwe conclude that the sample is inconsistent with (or, does not support) the hypothesis expressed by the restriction. Sincebis normal,r0bis also normal, and the test statistic t=dse[d]se[d] =qr

0[s2e(X0X)1]r

has at-distribution with df=nK. We can consider several restrictions at once by using themKmatrixRto de ne=R andd=Rb. Under the null that all restrictions hold we can de ne theWald statistic

W= (d)0h

s2eR(X0X)1R0i1(d):(7) Whas a2m-distribution if the sample is large enough (see section1.5 ) (ors2ein (7) is replaced by the usually unknown2). Instead, one can use the test statisticW=mwhich has anF-distribution with df=(m,nK). In small samples, a test based onW=mwill be more conservative (i.e. will have larger p-values). So far, restrictions have been tested using the estimates from the unrestricted model. Alternatively, restrictions may directly be imposed when the parameters are estimated. This will lead to a loss of t (i.e.R2will decrease). IfR2ris based on the parameter vector b r(where some of the parameters are xed rather than estimated) andR2uis based on the unrestricted estimate, the test statistic

F=(nK)(R2uR2r)m(1R2u)

has anF-distribution with df=(m,nK). It can be shown thatF=W=m(seeGreene ( 2003
), section 6.3). IfFis signi cantly di erent from zero, H0is rejected and the restric- tions are considered to be jointly invalid. The distribution of the test statisticst,FandWdepends on assumptionAN(normality of disturbances). In section 1.3 w ewill commen ton th ecase that ANdoes not hold.

1.2 Finite sample properties of least squares estimates15

1.2.4 Example 6: CAPM, beta-factors and multi-factor models

TheCapital Asset Pricing Model(CAPM) considers the equilibrium relation between the expected return of an asset or portfolio (i=E[yi]), the risk-free returnrf, and the expected return of the market portfolio (m=E[ym]). Based on various assumptions (e.g. quadratic utility or normality of returns) the CAPM states that  irf= i(mrf):(8) This relation is also known as thesecurity market line(SML). In the CAPM the so- calledbeta-factor ide ned as i=cov[yi;ym]V[ym] is the appropriate measure of an asset's risk. The (total) variance of the asset's returns is an inappropriate measure of risk since a part of this variance can be diversi ed away by holding the asset in a portfolio. The risk of the market portfolio cannot be diversi ed any further. The beta-factor ishows how the asset responds to market-wide movements and measures the market risk or systematic risk of the asset. The risk premium an investor can expect to obtain (or requires) is proportional to i. Assets with i>1 imply more risk than the market and should thus earn a proportionately higher risk premium. Observed returns of the asset (yit;t=1,...,n) and the market portfolio (ymt) can be used to estimate ior to test the CAPM. Under the assumption that observed returns deviate from expected returns we obtain y iti=uitymtm=umt: When we substitute these de nitions for the expected values in the CAPM we obtain the so-calledmarket model y it= i+ iymt+it; where i=(1 i)rfandit=uit iumt. The coecients iand iin this equation can be estimated by OLS. If we write the regression equation in terms of (observed)excessreturns x it=yitrfandxmt=ymtrfwe obtain x it= ixmt+it: Thus thetestable implicationof the CAPM is that the constant term in a simple linear regression using excess returns should be equal to zero. In addition, the CAPM implies that there must not be any other risk factors than the market portfolio (i.e. the coecients of such factors should not be signi cantly di erent from zero). We use monthly data on the excess return of two industry portfolios (consumer goods and hi-tech) compiled by French

11. We regress the excess returns of the two industries11

http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html. The les capm.wf1andcapm.txtare based on previous versions of data posted there. These les have been compiled using the datasets which are now labelled as "5 Industry Portfolios" and "Fama/French

3 Factors" (which includes the risk-free returnrf).

1.2 Finite sample properties of least squares estimates16

on the excess market return based on a value-weighted average of all NYSE, AMEX, and NASDAQ rms (all returns are measured in percentage terms). Using data from January

2000 to December 2004 (n=60) we obtain the following estimates for the consumer goods

portfolio (p-values in parenthesis; details can be found in the lecapm.wf1) x it= 0:343 (0.36)+0:624 (0.0)xmt+eitR2= 0:54se= 2:9; and for the hi-tech portfolio x it=0:717 (0.11)+1:74 (0.0)xmt+eitR2= 0:87se= 3:43: The coecients 0.624 and 1.74 indicate that a change in the (excess) market return by one percentage point implies a change in the expected excess return by 0.624 percentage points and 1.74 percentage points, respectively. In other words, the hi-tech portfolio has much higher market risk than the consumer goods portfolio. The market model can be used to decompose the total variance of an asset into market- and rm-speci c variance as follows (assuming that cov[ym;i]=0): 

2i= 2i2m+2i:

2i2mcan be interpreted as the risk that is market-speci c or systematic (cannot be diver-

si ed since it is due to market-wide movements) and2iis rm-speci c (or idiosyncratic) risk. SinceR2can also be written as ( 2i2m)=2iit measures the proportion of the market- speci c variance in total variance. TheR2from the two equations imply that 53% and

86% of the variance in the portfolio's returns are systematic. The higherR2from the hi-

tech regression indicates that this industry is better diversi ed than the consumer goods industry. The p-values of the constant terms indicate that the CAPM implication cannot be rejected. This conclusion changes, however, when the sample size is increased. The CAPM makes an (equilibrium) statement aboutallassets as expressed by the security market line ( 8 ). In order to test the CAPM, beta-factors^ ifor many assets are estimated from the market model using time-series regressions. Then mean returns yifor each asset (as an average across time) are computed, and the cross-sectional regression yi=f+m^ i+i is run. The estimates forfandm(the market risk premium) are estimates ofrfand (mrf) in equation (8). If the CAPM is valid, the mean returns of all assets should be located on the SML { i.e. on the line implied by this regression. However, there are some problems associated with this regression. The usual OLS standard errors of the estimated coecients are incorrect because of heteroscedasticity in the residuals. In addition, the regressors^ iare subject to anerrors-in-variablesproblem since they are not observed and will not correspond to the 'true' beta-factors.

Fama and MacBeth

( 1973
) have suggested a procedure to improve the precision of the estimates. They rst estimate beta-factors^ itfor a large number of assets by running

1.2 Finite sample properties of least squares estimates17

the market model regression using monthly

12time series of excess returns. The estimated

beta-factors are subsequently used as regressors in the cross-sectional regression y it=ft+mt^ it+it:

Note that

^ itis based on an excess return series which ends one month before the cross- sectional regression is estimated (i.e. usingxisandxmsfors=tn,...,t1). The cross- sectional regression is run in each month of the sample period and a times series of esti- mates^ftand^mtis obtained. The sample means and the standard errors of^ftand^mt are used as the nal estimates for statistical inference13. Although the Fama-MacBeth approach yields improved estimates,

Shank en

( 1992
) has pointed out further de ciencies and has suggested a correction. The CAPM has been frequently challenged by empirical evidence indicating signi cant risk premia associated with other factors than the market portfolio. A crucial aspect of the CAPM (in addition to assumptions about utility or return distributions) is that the market portfolio must includeall availableassets (which is hard to achieve in empirical studies). According to theArbitrage Pricing Theory(APT) byRoss ( 1976) there existseveralrisk factorsFjthat arecommonto a set of assets. The factors are assumed to be uncorrelated, but no further assumptions about utility or return distributions are made. These risk factors (and not only the market risk) capture the systematic risk component. Although the APT does not explicitly specify the nature of these factors, em- pirical research has typically considered two types of factors. One factor type corresponds to macroeconomic conditions such as in ation or industrial production (see

Chen et al.

, 1986
), and a second type corresponds to portfolios (see

F amaand F rench

, 1992
). Consid- ering only two common factors (for notational simplicity) the asset returns are governed by the factor model y it= i+ i1Ft1+ i2Ft2+it; where jiare thefactor sensitivities(orfactor loadings). The expected return of a single asset in this two-factor model is given by

E[yi] =i=0+1 i1+2 i2;

wherejis the factor risk premium ofFjand0=rf. Using V[Fj]=2jand cov[F1;F2]=0 the total variance of an asset can be decomposed as follows: 

2i= 2i121+ 2i222+2i:

Estimation of the beta-factors is done byfactor analysis, which is not treated in this text. For further details of the APT and associated empirical investigations see

Roll and

Ross ( 1980
).

We brie

y investigate one version of multi-factor models using the so-called Fama-French benchmark factors SMB (small minus big) and HML (high minus low) to test whether12 Using monthly data is not a prerequisite of the procedure. It could be performed using other data frequencies as well.

13SeeFama-MacBeth.xlsxfor an illustration of the procedure using only 30 assets and the S&P500 index.

1.2 Finite sample properties of least squares estimates18

excess returns depend on other factors than the market return. The factor SMB measures the di erence in returns of portfolios of small and large stocks, and is intended to measure the so-calledsize e ect. HML measures the di erence between value stocks (having a high book value relative to their market value) and growth stocks (with a low book-market ratio).

14The estimated regression equations are (details can be found in the lecapm.wf1)

x it= 0:085 (0.8)+0:68 (0.0)xmt0:089 (0.30)SMB t+0:29 (0.0)HML t+etR2= 0:7 for the consumer goods portfolio and x it=0:83 (0.07)+1:66 (0.0)xmt+0:244 (0.04)SMB t0:112 (0.21)HML t+etR2= 0:89 for the hi-tech portfolio. Consistent with the CAPM the constant terms in the rst case is not signi cant. The beta-factor remains signi cant in both industries and changes only slightly compared to the market model estimates. However, the results indicate a signi cant return premium for holding value stocks in the consumer goods industry. For the hi-tech portfolio we nd support for a size-e ect. Overall, the results can be viewed as supporting multi-factor models. Exercise 3:Retrieve excess returns for industry portfolios of your choice from French's website. Estimate beta-factors in the context of multi-factor models. Interpret the results and test implications of the CAPM.14

Further details on the variable de nitions and the underlying considerations can be found on French's

websitehttp://mba.tuck.dartmouth.edu/pages/faculty/ken.french.

1.2 Finite sample properties of least squares estimates19

1.2.5 Example 7: Interest rate parity

We consider a European investor who invests in a riskless US deposit with raterf. He buys US dollars at the spot exchange rateSt(Stis the amount in Euro paid/received for one dollar), invests atrf, and after one period converts back to Euro at the rateSt+1. The one-period return on this investment is given by lnSt+1lnSt+rf: Forward exchange ratesFtcan be used to hedge against the currency risk (introduced by the unknownSt+1) involved in this investment. IfFtdenotes the rate xed attto buy/sell

US dollars int+1 the (certain) return is given by

lnFtlnSt+rf: Since this return is riskless it must equal the returnrdffrom a domestic riskless investment to avoid arbitrage. This leads to thecovered interest rate parity(CIRP) r dfrf= lnFtlnSt: The left hand side is the interest rate di erential and the right hand side is theforward premium. Theuncovered interest rate parity(UIRP) is de ned in terms of theexpectedspot rate r dfrf= Et[lnSt+1]lnSt: E t[lnSt+1] can di er from lnFtif the market pays a risk premium for taking the risk of an unhedged investment. A narrowly de ned version of the UIRP assumes risk neutrality and states that the risk premium is zero (see

Engel

, 1996
, for a survey) E t[lnSt+1lnSt] = lnFtlnSt: Observed exchange ratesSt+1can deviate fromFt, but the expected di erence must be zero. The UIRP can be tested using theFama regression s tst1= 0+ 1(ft1st1) +t; wherest=lnStandft=lnFt. The UIRP imposes the testable restrictions 0=0 and 1=1.15

We use a data set

16fromV erbeek( 2004) and obtain the following results (t-statistics in

parenthesis) s tst1= 0:0023 (0.72)+0:515 (0.67)(ft1st1) +etR2= 0:00165:15

Hayashi

( 2000
, p.424) discusses the question, why UIRP cannot be tested on the basis of s t= 0+ 1ft1+t.

16This data is available fromhttp://eu.wiley.com/legacy/wileychi/verbeek2ed/datasets.html. We

use the corrected data setforward2cfrom chapter 4 (foreign exchange markets). Note that the exchange

and forward rates in this dataset are expressed in terms of US dollars paid/received for one Euro. To make

the data consistent with the description in this section we have de ned the logs of spot and forward rates

accordingly (although this does not change the substantive conclusions). Details can be found in the les

uirp.Roruirp.xls.

1.2 Finite sample properties of least squares estimates20

Testing the coecients individually shows thatb0is not signi cantly di erent from 0 and b

1is not signi cantly di erent from 1.

To test both restrictions at once we de ne

R="1 0

0 1# ="0 1# : The Wald statistic for testing both restrictions equals 3.903 with a p-value of 0.142. The p-value of theF-statisticW=2=1.952 is 0.144. Alternatively, we can use theR2from the restricted model with 0=0 and 1=1. This requires to de ne restricted residuals according to (stst1)(ft1st1). The associatedR2is negative and theF-statistic is again 1.952. Thus, the joint test con rms the conclusion derived from testing individual coecients, and we cannot reject UIRP (which does not mean that UIRP holds!). Exercise 4:Repeat the analysis and tests from example7 but use the US dollar/British pound exchange and forward rates in the lesforward2c.dat, uirp.xls, oruirp.wf1to test the UIRP.

1.2 Finite sample properties of least squares estimates21

1.2.6 Prediction

Regression models can be also used for out-of-sampleprediction. Suppose the estimated model fromnobservations isy=Xb+eand we want to predicty0given a new observation of the regressorsx0which has not been included in the estimation (hence: out-of-sample). From the Gauss-Markov theorem it follows that the prediction ^y0=x00b is the BLUE of E[y0]. Its variance is given by

V[^y0] = V[x00b] =x00V[b]x0=x002(X0X)1x0;

and re ects the sampling error ofb. The prediction error is e

0=y0^y0=x00 +0x00b=0+x00( b);

and its variance is given by

V[e0] =2+ V[x00( b)] =2+2x00(X0X)1x0:

The variance can be estimated by usings2ein place of2. For the special case of a single regressor the variance ofe0is given by (see (6) andKmen ta( 1971), p.240)  2" 1 +1n +(x0x)2P ni=1(xix)2# : This shows that the variance of the prediction (error) increases with the distance ofx0from the mean of the regressors and decreases with the sample size. The (estimated) variance of the disturbances can be viewed as a lower bound for the variance of the out-of-sample prediction error. If2is replaced bys2ewe can compute a 1 prediction interval fory0from ^y0z =2se[e0]; where se[e0] is the square root of the estimated variance^V[e0]. These calculations, using example 1 , can be found in the leinvestment.xlson the sheetprediction.

1.3 Large sample properties of least squares estimates22

1.3 Large sample properties of least squares estimates

17

Review 5:

18We consider the asymptotic properties of an estimator^nwhich hold as

the sample sizengrows without bound. Convergence:The random variable^nconverges in probabilityto the (non- random) constantcif, for any>0, lim n!1P[j^ncj> ] = 0: cis theprobability limitof^nand is denoted by plim^n=c.

Rules for scalarsxnandyn:

plim(xn+yn) = plimxn+ plimynplim(xnyn) = plimxnplimyn:

Rules for vectors and matrices:

plimXy= plimXplimy:

Rule for a nonsingular matrixX:

plimX1= (plimX)1:

Consistency:

^nisconsistentforif plim^n=.^nis consistent if theasymptotic biasis zero and theasymptotic varianceis zero: lim n!1E[^n]= 0 limn!1aV[^n] = 0: Example: The sample mean yfrom a population withand2is consistent for since E[y]=and aV[y]=2=n. Thus plim y=. Consistency of a mean of functions:Consider a random sample (y1,...,yn) from a random variableYand any functionf(y). If E[f(Y)] and V[f(Y)] are nite constants then plim 1n n X i=1f(yi) = E[f(Y)]:

Limiting distribution:

^nwith cdfFnconverges in distributionto a random variablewith cdfF(this is denoted by^nd!) if lim n!1jFnFj= 0 for every continuity point ofF.Fis thelimitingorasymptotic distribution ofn.

A consistent estimator

^nisasymptotically normalif pn(^n)d!N(0;v) or^naN(;v=n); where aV[ ^n]=v=nis the asymptotic variance of^n.17

Most of this subsection is based on

Greene

( 2003
), sections 5.2 and 5.3.

18Greene( 2003); section D.

1.3 Large sample properties of least squares estimates23

Central Limit Theorem:If yis the sample mean of a random sample (y1,...,yn) from a distribution with meanand variance2(which need not be normal) pn(y)d!N(0;2) or yaN(;2=n):

Expressed di erently,

z n=y= pn is asymptotically standard normal:znaN(0,1). The nite sample properties of OLS estimates only hold if assumptionsAL,AR,AX, and AHare satis ed.ANis required to obtain the exact distribution ofband to derive (the distribution of) test statistics. Large-sample theory dropsANand adds other assumptions about the data generating mechanism. The sample is assumed to be large enough so that certain asymptotic properties hold, and an approximation of the distribution of OLS estimates can be derived.

1.3.1 Consistency

Consistency relates to the properties ofbasn!1. Therefore we use the formulation b n= +1n X0X 11n X0 :(9) This shows that the large-sample properties ofbndepend on the behavior of the sample averages ofX0XandX0. In addition to the assumptions from the previous subsection we assume that (xi,i) are an i.i.d. sequence of random variables:

Aiid: (xi;i)i.i.d.

To prove consistency we consider the probability limit ofbn: plimbn= + plim" 1n X0X 1 1n X0 # = + plim1n X0X 1 plim1n X0 :(10) We have to make sure that the covariance matrix of regressorsXis 'well behaved'. This requires that all elements ofX0X=nconverge to nite constants (i.e. the corresponding population moments). This is expressed by the assumptionAR: plim1n

X0X=Q;(11)

whereQis a positive de nite matrix.

Regarding the second probability limit in (

10 ),

Greene

( 2003
, p.66) de nes 1n

X0=1n

n X i=1x ii=1n n X i=1w i=wn

1.3 Large sample properties of least squares estimates24

and usesAXto show that E[ wn] =0V[wn] = E[wnw0n] =2n

E[X0X]n

:

The variance of

wnwill converge to zero, which implies that plimwn=0, or plim 1n X0 =0:

Thus the probability limit ofbnis given by

plimbn= +Q10; and we conclude thatbnis consistent: plimbn= :

1.3 Large sample properties of least squares estimates25

1.3.2 Asymptotic normality

Large-sample theory isnotbased on the normality assumptionAN, but derives an ap- proximation of the distribution of OLS estimates. We rewrite ( 9 ) as pn(bn ) =1n X0X 11pn X0 (12) to derive the asymptotic distribution of

pn(bn ) using the central limit theorem. ByARthe probability limit of the rst term on the right hand side of (12) isQ1. Next we

consider the limiting distribution of 1pn

X0=pn(wnE[wn]):

 wnis the average ofni.i.d. random vectorswi=xii. From the previous subsection we know that E[ wn]=0.Greene ( 2003, p.68) shows that the variance ofpn wnconverges to 

2Q. Thus, in analogy to the univariate case, we can apply the central limit theorem.

The means of the i.i.d. random vectorswiconverge to a normal distribution: 1pn

X0=pn

wnd!N(0;2Q): We can now complete the derivation of the limiting distribution of ( 12 ) by includingQ1 to obtain Q 11pn

X0d!N(Q10;Q1(2Q)Q1)

or pn(bn )d!N(0;2Q1)bnaN( ;2n Q1): Note that the asymptotic normality ofbis not based onANbut on the central limit theorem. The asymptotic covariance ofbnis estimated by using (X0X)1to estimate (1/n)Q1ands2e=SSE/(nK) to estimate2: c aV[bn] =s2e(X0X)1: This implies thatt- andF-statistics are asymptotically valid even if the residuals are not normal. IfFhas anF-distribution with df=(m,nk) thenW=mFa2m. In small samples thet-distribution may be a reasonable approximation19even whenAN does not hold. Since it is more conservative than the standard normal, it may be preferable to use thet-distribution. By a similar argument, using theF-distribution (rather than W=mFand the2distribution) can be justi ed in small samples whenANdoes not hold.19 IfANdoes not hold the nite sample distribution of thet-statistic is unknown.

1.3 Large sample properties of least squares estimates26

1.3.3 Time series data

20 With time series data the strict exogeneity assumptionAXis usually hard to maintain. For example, a company's returns may depend on the current, exogenous macroeconomic conditions and the rm's past production (or investment, nance, etc.) decisions. To the extent that the company decides upon the level of production based on past realized returns (which include past disturbances), the current disturbances may be correlated with regressors in future equations. More generally, strict exogeneity might not hold if regressors are policy variables which are set depending on past outcomes. IfAXdoes not hold (e.g. in a model with a lagged dependent variable),bnis biased. In the previous subsections consistency and asymptotic normality have been established on the basis ofAiidandAR. However, with time series data the i.i.d. assumption need not hold and the applicability of limit theorems is not straightforward. Nevertheless, consistent estimates in a time series context can still be obtained. The additional assumptions needed are based on the following concepts. Astochastic processYtis a sequence21of random variablesY1,:::,Y0,Y1,:::,Y+1. An observed sequenceyt(t=1;:::;n) is a sample orrealization(one possible outcome) of the stochastic process. Any statistical inference aboutYtmust be based on thesingle draw y tfrom the so-calledensembleof realizations of the process. Two properties are crucial in this context: the process has to bestationary(i.e. the underlying distribution ofYt does not change witht) andergodic(i.e. each individual observation provides unique information about the process; adjacent observations must not be too similar). More formally, a stationary process isergodicif any two random variablesYtandYt`are asymptotically (i.e.`!1) independent. A stochastic process is characterized by theautocovariance ` `= E[(Yt)(Yt`)]= E[Yt];(13) or theautocorrelation`  `= ` 0= `

2:(14)

A stochastic process isweaklyorcovariance stationaryif E[Y2t]<1and if E[Yt], V[Yt] and `do not depend ont(i.e. `and`only depend on`). IfYtisstrictly stationary the joint distribution ofYtandYt`does not depend on the time shift`. IfYtis weakly stationary and normally distributed thenYtis also strictly stationary. According to theergodic theorem, averages from a single observed sequence will con- verge to the corresponding parameters of the population, if the process is stationary and ergodic. IfYtis stationary and ergodic with E[Yt]=, the sample mean obtained from a single realizationytconverges toasymptotically: lim n!11n yn=nX t=1y t=:20

Most of this subsection is based on

Greene

( 2003
), section 12.4.

21We use the indextsince stochastic processes are frequently viewed in terms of chronologically ordered

sequences acrosstime. However, the index set is arbitrary and everything we say holds as well if the index

refers to other entities (e.g. rms).

1.3 Large sample properties of least squares estimates27

IfYtis covariance stationary it is sucient that1X `=0j `j<1(absolute summability) for the process to be ergodic for the mean. The theorem extends to any ( nite) moment of stationary and ergodic processes. In the special case whereYtis a normal and station- ary process, then absolute summability is enough to insure ergodicity for all moments. Whereas many tests for stationarity are available (see section

2.3.3

), ergodicity is dicult to test and is usuallyassumedto hold. Quickly decayingestimatedautocorrelations can be taken asempiricalevidence of stationarity and ergodicity. In other words, the ergodic theorem implies that consistency does not require independent observations.

Greene

( 2003
, p.73) shows that consistency and asymptotic normality of the

OLS estimator can be preserved in a time-series context by replacingAXwith22AX: E[tjxt`] = 0 (8`0);

replacingARby AR t: plim1n`n X t=`+1x tx0t`=Q(`); whereQ(`) is a nite matrix, and by requiring thatQ(`) has to converge to a matrix of zeros as`!1. These properties ofQ(`) can be summarized by the assumption thatxtis stationary and ergodic. In addition, the autocorrelation`of the disturbancesthas to be zero (for all`), although not always explicitly stated. This has the following implications for models with a lagged dependent variables: y t=1yt1++pytp+z0t +t: Although estimates ofiand are biased (sinceAXis violated), they are consistent providedAXholds,xt=[yt1;:::;ytp;zt] is stationary and ergodic, andtis not auto- correlated. In section

1.7.3

w etak ea closer lo okat the case when tis autocorrelated.22

Other authors (e.g.

Ha yashi

, 2000
, p.109) assume thattandxtare contemporaneously uncorrelated (E[xtt]=0), as implied byAX.

1.4 Maximum likelihood estimation28

1.4 Maximum likelihood estimation

Review 6:

23We consider a random sampleyi(i=1;:::;n) to estimate the parameters

and2of a random variableYN(;2). The maximum likelihood (ML) estimates are those values for the parameters of the underlying distribution which make the observed sample most likely (i.e. would generate it most frequently). Thelik
Politique de confidentialité -Privacy policy