[PDF] Lecture Notes on Asymptotic Statistics - Data Science Association




Loading...







[PDF] Asymptotic Theory of Statistics and Probability

of Statistics and Probability Asymptotic Distribution of One Order Statistic 21 3 Asymptotic Theory of Likelihood Ratio Test Statistics

[PDF] STATISTICAL ASYMPTOTICS - University of Warwick

Statistical asymptotics draws from a variety of sources including (but not restricted to) probability theory, analysis (e g Taylor's theorem), and of

[PDF] Asymptotic Theory - Statistics

Asymptotic theory (or large sample theory) aims at answering the question: what happens as we gather more and more data? In particular, given random sample, 

Asymptotic theory of statistical inference, by B L S Prakasa Rao

The asymptotic theory of statistical inference is the study of how well we may succeed in this pursuit, in quantitative terms Any function of the data, 

[PDF] Asymptotic in Statistics Lecture Notes for Stat522B Jiahua Chen

Review of probability theory, probability inequalities • Modes of convergence, stochastic order, laws of large numbers • Results on asymptotic normality

[PDF] Lecture Notes on Asymptotic Statistics - Data Science Association

Asymptotic Theory of Statistics and Probability, Springer Serfling, R (1980) Approximation Theorems of Mathematical Statistics, John Wiley, New

[PDF] Asymptotic Theory in Probability and Statistics with Applications

To celebrate the 65th birthday of Professor Zhengyan Lin, an Inter- national Conference on Asymptotic Theory in Probability and Statistics

[PDF] Chapter 6 Asymptotic Distribution Theory

In Chapter 5, we derive exact distributions of several sample statistics based on a random sample of observations • In many situations an exact statistical 

[PDF] Lecture Notes on Asymptotic Statistics - Data Science Association 22869_6AsymptoticStatistics_LectureNotes.pdf

Lecture Notes on Asymptotic Statistics

Changliang Zou

Prologue

Why asymptotic statistics? The use of asymptotic approximation is two-fold. First, they enable us to ¯nd approximate tests and con¯dence regions. Second, approximations can be used theoretically to study the quality (e±ciency) of statistical procedures| Van der Vaart

Approximate statistical procedures

To carry out a statistical test, we need to know the critical value of the test statistic. Roughly speaking, this means we must know the distribution of the test statistic under the null hypothesis. Because such distributions are often analytically intractable, only approxi- mations are available in practice. Consider for instance the classicalt-test for location. Given a sample of iid observations X

1;:::;Xn, we wish to testH0:¹=¹0. If the observations arise from a normal distribution

with mean¹0, then the distribution oft-test statistic,p n(¹Xn¡¹0)=Sn, is exactly known, sayt(n¡1). However, we may have doubts regarding the normality. If the number of observations is not too small, this does not matter too much. Then we may act as ifp n(¹Xn¡¹0)=Sn»N(0;1). The theoretical justi¯cation is the limiting result, asn! 1, sup x¯

¯¯¯Pµ

p n(¹Xn¡¹) S n·x¶

¡©(x)¯¯¯¯!0;

provided that the variablesXihave a ¯nite second moment. Then, a \large-sample" or \asymptotical" level®test is to rejectH0ifjp n(¹Xn¡¹0)=Snj> z®=2. When the underlying distribution is exponential, the approximation is satisfactory ifn¸100. Thus, one aim of asymptotic statistics is to derive the asymptotical distribution of many types of statistics. There are similar bene¯ts when obtaining con¯dence intervals. For instance, consider maximum likelihood estimator bµnof dimensionpbased on a sample of sizenfrom a density f(X;µ). A major result in asymptotic statistic is that in many situationsp n(bµn¡µ) is asymptotically normally distributed with zero mean and covariance matrixI¡1

µ, where

I

µ=Eµ"

µ@logf(X;µ)

@µ¶µ @logf(X;µ) @µ¶ T# is the Fisher information matrix. Thus, acting as if p n(bµn¡µ)»Np(0;I¡1

µ), we can ¯nd

2 the following ellipsoid½ µ: (µ¡bµn)TIµ(µ¡bµn)·Â2p;® n ¾ is an approximate 1¡®con¯dence region.

E±ciency of statistical procedures

For a relatively small number of statistical problems, there exists an exact, optimal solution. For example, the Neyman-Pearson lemma to ¯nd UMP tests, the Rao-Blackwell theory to ¯nd MVUE, and Cramer-Rao Theorem. However, there are not always exact optimal theory or procedure, then asymptotic op- timality theory may help. For instance, to compare two tests, we might compare approxi- mations to their power functions. Consider the foregoing hypothesis problem for location. A well-known nonparametric test statistic is thesign statisticTn=n¡1nP i=1I

Xi>µ0, where the

null hypothesis isH0:µ=µ0andµdenotes the median associated the distribution ofX. To compare the e±ciency of sign andt-test is rather di±cult because the exact power functions of two tests are untractable. However, by the de¯nitions and methods introduced later, we can obtain the asymptotic relative e±ciency of the sign test versus thet-test is equal to

4f2(0)Z

x

2f(x)dx:

To compare estimators, we might compare asymptotic variances rather than exact variances. A major result in this area is that for smooth parametric models maximum likelihood es- timators are asymptotically optimal. This roughly means the following. First, MLE are asymptotically consistent; Second, the rate at which MLE converge to the true value is the fastest possible, typicallyp n; Third, the asymptotic variance, attain the C-R bound. Thus, asymptotic justify the use of MLE in certain situations. (Even though in general it does not lead to best estimators for ¯nite sample in many cases, it is always not a worst one and always leads to a reasonable estimator.

Contents

² Basic convergence concepts and preliminary theorems (8) ² Transformations of given statistics: The Delta method (4) 3 ²The basic sample statistics: distribution function, moment, quantiles, and order statis- tics (3) ² Asymptotic theory in parametric inference: MLE, likelihood ratio test, etc (6) ²

U-statistic,M-estimates andR-estimates (6)

²

Asymptotic relative e±ciency (6)

² Asymptotic theory in nonparametric inference: rank and sign tests (6) ²

Goodness of ¯t (3)

² Nonparametric regression and density estimation (4) ² Advanced topic selected: bootstrap and empirical likelihood (4)

Text books

Billingsley, P. (1995).Probability and Measure, 3rd edition, John Wiley, New York. DasGupta, A. (2008).Asymptotic Theory of Statistics and Probability, Springer. Ser°ing, R. (1980).Approximation Theorems of Mathematical Statistics, John Wiley, New York. Shao, J. (2003).Mathematical Statistics, 2nd ed. Springer, New York. Van der Vaart, A. W. (2000).Asymptotic Statistics, Cambridge University Press. 4

Chapter 1

Basic convergence concepts and

preliminary theorems Throughout this course, there will usually be an underlying probability space (;F;P), where is a set of points,Fis a¾-¯eld of subsets of , andPis a probability distribution or measure de¯ned on the element ofF. A random variableX(w) is a transformation of into the real lineRsuch that imagesX¡1(B) of Borel setsBare elements ofF. A collection of random variablesX1(w);X2(w);:::on a given (;F) will typically be denoted byX1;X2;:::.

1.1 Modes of convergence of a sequence of random

variables

De¯nition 1.1.1 (convergence in probability)

LetfXn;Xgbe random variables de¯ned

on a common probability space. We sayXnconverges to X in probability if, for any² >0,

P(jXn¡Xj> ²)!0asn! 1, or equivalently

lim n!1P(jXn¡Xj< ²) = 1;every² >0: 5 This is usually written asXnp!X. Extensions to the vector case: for randomp-vectors X

1;X2:::andX, we sayXnp!XifjjXn¡Xjjp!0, wherejjzjj= (Pp

i=1z2i)1=2denotes the Euclidean distance (L2-norm) forz2Rp. It is easily to seen thatXnp!Xi® the corresponding component-wise convergence holds.

Example 1.1.1

For iid Bernoulli trials with a success probabilityp= 1=2, letXndenote the number of times in the ¯rstntrials that a success is followed by a failure. DenotingTi=Ifith trial is success and (i+1)st trial is a failureg,Xn=Pn¡1 i=1Ti, and thereforeE[Xn] = (n¡1)=4, and Var[Xn] =Pn¡1 i=1Var[Ti]+2Pn¡2 i=1Cov[Ti;Ti+1] = 3(n¡1)=16¡2(n¡2)=16 = (n+1)=16. It then follows by an application of Chebyshev's inequality thatXn=np!1=4. [P(jx¡¹j ¸

²)·¾2=²2]

De¯nition 1.1.2 (bounded in probability)

A sequence of random variablesXnis said

to be bounded in probability if, for any² >0, there exists a constantksuch thatP(jXnj> k)·²for alln. Any random variable (vector) is bounded in probability. It is convenient to have short expressions for terms that converge or bounded in probability. IfXnp!0, then we write X n=op(1), pronounced by \small oh-P-one"; The expressionOp(1) (\big oh-P-one") denotes a sequence that is bounded in probability, say, writeXn=Op(1). These are so-called stochastico(¢) andO(¢). More generally, for a given sequence of random variablesRn, X n=op(Rn) meansXn=YnRnandYnp!0; X n=Op(Rn) meansXn=YnRnandYn=Op(1): This expresses that the sequenceXnconverges in probability to zero or is bounded in prob- ability \at the rateRn". For deterministic sequencesXnandRn,Op(¢) andop(¢) reduce to the usualo(¢) andO(¢) from calculus. Obviously,Xn=op(Rn) implies thatXn=Op(Rn). An expression we will often used is: for some sequencean, ifanXnp!0, then we write X n=op(a¡1n); ifanXn=Op(1), then we writeXn=Op(a¡1n). De¯nition 1.1.3 (convergence with probability one)

LetfXn;Xgbe random variables

6 de¯ned on a common probability space. We sayXnconverges to X with probability 1 (or al- most surely, strongly, almost everywhere) if P ³ limn!1Xn=X´ = 1: This can be written asP(!:Xn(!)!X(!)) = 1. We denote this mode of convergence as X nwp1!XorXna.s!X. Extensions to random vector case is straightforward. Almost sure convergence is a stronger mode of convergence than convergence in proba- bility. In fact, a characterization ofwp1 is that lim n!1P(jXm¡Xj< ²;allm¸n) = 1;every² >0:(1.1) It is clear from this equivalent condition thatwp1 is stronger than convergence in probability. Its proof can be found on page 7 in Ser°ing (1980).

Example 1.1.2

SupposeX1;X2;:::is an in¯nite sequence of iidU[0;1] random variables, and letX(n)= maxfX1;:::;Xng. SeeX(n)wp1!1. Note that P(jX(n)¡1j ·²;8n¸m) =P(X(n)¸1¡²;8n¸m) =P(X(m)¸1¡²) = 1¡(1¡²)m!1;asm! 1:

De¯nition 1.1.4 (convergence inrth mean)

LetfXn;Xgbe random variables de¯ned

on a common probability space. Forr >0, we sayXnconverges to X inrth mean if lim n!1EjXn¡Xjr= 0:

This is writtenXnrth!X. It is easily shown that

X nrth!X)Xnsth!X;0< s < r; by Jensen's inequality (Ifg(¢) is a convex function onR, andXandg(X) are integrable r.v.'s, theng(E[X])·E[g(X)]). 7 De¯nition 1.1.5 (convergence in distribution)LetfXn;Xgbe random variables. Con- sider their distribution functionsFXn(¢)andFX(¢). We say thatXnconverges in distribution (in law) toXiflimn!1FXn(t) =FX(t)at every point that is a continuity point ofFX.

This is written asXnd!XorFXn)FX.

Example 1.1.3

ConsiderXn»Uniformf1

n ;2 n ;:::;n¡1 n ;1g. Then, it can be shown easily that the sequenceXnconverges in law toU[0;1]. Actually, consider anyt2[i n ;i+1 n ), the di®erence betweenFXn(t) =i n andFX(t) =tcan be arbitrarily small ifnis su±ciently large (ji n ¡tj< n¡1). The result follows from the de¯nition ofd!.

Example 1.1.4

LetfXng1n=1is a sequence of random variables whereXn»N(0;1+n¡1). Taking the limit of the distribution function ofXnasn! 1yields limnFXn(x) = ©(x) for allx2R. Thus,Xnd!N(0;1). According to the assertion below the de¯nition of p!, we know thatXnp!Xis equivalent to convergence of every one of the sequences of components. The analogous statement for convergence in distribution is false: Convergence in distribution of the sequenceXnis stronger than convergence of every one of the sequences of componentsXni. The point is that the distribution of the componentsXniseparately does not determine their distribution (they might be independent or dependent in many ways). We speak ofjoint convergencein law versusmarginal convergence.

Example 1.1.5

IfX»U[0;1]andXn=Xfor alln, andYn=Xfornodd andYn= 1¡X forneven, thenXnd!XandYnd!U[0;1], yet(Xn;Yn)does not converge in law. SupposefXn;Xgare integer-valued random variables. It is not hard to show that X nd!X,P(Xn=k)!P(X=k) for every integerk. This is a useful characterization of convergence in law for integer-valued random variables. 8

1.2 Fundamental results and theorems on convergence

1.2.1 Relationship

The results describes the relationship among four convergence modes are summarized as follows.

Theorem 1.2.1

LetfXn;Xgbe random variables (vectors).

(i)

IfXnwp1!X, thenXnp!X.

(ii)

IfXnrth!Xfor ar >0, thenXnp!X.

(iii)

IfXnp!X, thenXnd!X.

(iv)

If, for every² >0,1P

n=1P(jXn¡Xj> ²)<1, thenXnwp1!X. Proof.(i) is an obvious consequence of the equivalent characterization (1.1); (ii) for any

² >0,

EjXn¡Xjr¸E[jXn¡XjrI(jXn¡Xj> ²)]¸²rP(jXn¡Xj> ²) and thus P(jXn¡Xj> ²)·²¡rEjXn¡Xjr!0;asn! 1: (iii) This is a direct application of Slutsky Theorem; (iv) Let² >0 be given. We have

P(jXm¡Xj ¸²;for somem¸n) =PÃ

1[ m=nfjXm¡Xj ¸²g!

·1X

m=nP(jXm¡Xj ¸²): The last term in the equation above is the tail of a convergent series and hence goes to zero asn! 1.¤

Example 1.2.1

Consider iidN(0;1) random variablesX1;X2;:::;and suppose¹Xnis the mean of the ¯rstnobservations. For an² >0, considerP1 n=1P(j¹Xnj> ²). By Markov's inequality,P(j¹Xnj> ²)·E[¹X4n] ² 4=3 ²

4n2. SinceP1

n=1n¡2<1, from Theorem 1.2.1-(iv) it follows thatXnwp1!0. 9

1.2.2 Transformation

It turns out that continuous transformations preserve many types of convergence, and this fact is useful in many applications. We record it next. Its proof can be found on page 24 in

Ser°ing (1980).

Theorem 1.2.2 (Continuous Mapping Theorem)

LetX1;X2;:::andXbe randomp-

vectors de¯ned on a probability space, and letg(¢)be a vector-valued (including real-valued) continuous function de¯ned onRp. IfXnconverges toXin probability, almost surely, or in law, theng(Xn)converges toXin probability, almost surely, or in law, respectively.

Example 1.2.2

(i) IfXnd!N(0;1), thenÂ21; (ii) If (Xn;Yn)d!N2(0;I2), then maxfXn;Yngd!maxfX;Yg; which has the CDF [©(x)]2. The most commonly considered functions of vectors converging in some stochastic sense are linear and quadratic forms, which is summarized in the following result.

Corollary 1.2.1

Suppose that thep-vectorXnconverge to thep-vectorXin probability, almost surely, or in law. LetAq£pandBp£pbe matrices. ThenAXn!AXandXTnBXn! X

TBXin the given mode of convergence.

Proof.The vector-valued function

Ax=Ã

pX i=1a

1ixi;:::;pX

i=1a qixi! T and the real-valued function x

TBx=pX

i=1p X j=1b ijxixj are continuous function ofx= (x1;:::;xp)T.¤ 10

Example 1.2.3

(i) IfXnd!Np(¹;§), thenCXnd!N(C¹;C§CT) whereCq£pis a matrix;

Also, (Xn¡¹)T§¡1(Xn¡¹)d!Â2p; (ii) (Sums and products of random variables converging

wp1 or in probability) IfXnwp1!XandYnwp1!Y, thenXn+Ynwp1!X+YandXnYnwp1!XY. Replacing the wp1 with in probability, the foregoing arguments also hold.

Remark 1.2.1

The condition thatg(¢) is continuous function in Theorem 1.2.2 can be further relaxed to thatg(¢) is continuous a.s., i.e.,P(X2C(g)) = 1 whereC(g) =fx: gis continuous atxgis called the continuity set ofg.

Example 1.2.4

(i) IfXnd!X»N(0;1), then 1=Xnd!Z, whereZhas the distribution of

1=X, even though the functiong(x) = 1=xis not continuous at 0. This is due toP(X=

0) = 0. However, ifXn= 1=n(degenerate distribution) and

g(x) =8 < :1; x >0;

0; x·0;

thenXnd!0 butg(Xn)d!16=g(0); (ii)If (Xn;Yn)d!N2(0;I2) thenXn=Ynd!Cauchy.

Example 1.2.5

LetfXg1n=1be a sequence of independent random variables whereXnhas a Poi(µ) distribution. Let¹Xnbe the sample mean computed onX1;:::;Xn. By de¯nition, we can see that ¹Xnp!µasn! 1. If we wish to ¯nd a consistent estimator of the standard deviation ofXnwhich isµ1=2we can consider¹X1=2n. CMT implies that the square root transformation is continuous atµifµ >0 that¹X1=2np!µ1=2asn! 1. In Example 1.2.2, the condition that (Xn;Yn)d!N2(0;I2) cannot be relaxed toXnd!X andYnd!YwhereXandYare independent, i.e., we need the convergence of the joint CDF of (Xn;Yn). This is di®erent whend!is replaced byp!orwp1!, such as in Example 1.2.3-(ii). The following result, which plays an important role in probability and statistics, establishes the convergence in distribution ofXn+YnorXnYnwhen no information regarding the joint

CDF of (Xn;Yn) is provided.

Theorem 1.2.3 (Slutsky's Theorem)

LetXnd!XandYnp!c, wherecis a ¯nite con-

stant. Then, 11 (i) X n+Ynd!X+c; (ii) X nYnd!cX; (iii) X n=Ynd!X=cifc6= 0. Proof.The method of proof of the theorem is demonstrated su±ciently by proving (i). Choose and ¯xtsuch thatt¡cis a continuity point ofFX. Let" >0 be such thatt¡c+" andt¡c¡"are also continuity points ofFX. Then F

Xn+Yn(t) =P(Xn+Yn·t)

·P(Xn+Yn·t;jYn¡cj< ") +P(jYn¡cj ¸")

·P(Xn·t¡c+") +P(jYn¡cj ¸")

and, similarly F Xn+Yn(t)¸P(Xn·t¡c¡")¡P(jYn¡cj ¸"): It follows from the previous two inequalities and the hypotheses of the theorem that F

X(t¡c¡")·liminfnFXn+Yn(t)·limsup

nFXn+Yn(t)·FX(t¡c+"): Sincet¡cis a continuity point ofFX, and since"can be taken arbitrary small, the above equation yields lim nFXn+Yn(t) =FX(t¡c):

The result follows fromFX(t¡c) =FX+c(t).¤

Extensions to the vector case is straightforward. (iii) is valid providedC6=0is under- stood asCbeing invertible. A straightforward but often used result by this theorem is thatXnd!XandXn¡Ynp!0, thenYnd!X. In asymptotic practice, we often ¯rstly derive the result such asYn=Xn+op(1) and then investigate the asymptotic distribution ofXn. 12 Example 1.2.6(i) Theorem 1.2.1-(iii); Furthermore, convergence in probability to a con- stant is equivalent to convergence in law to the given constant. \)" follows from the part (i). \(" can be proved by de¯nition. Because the degenerate distribution function of constant cis continuous everywhere except for pointc, for any² >0, P(jXn¡cj ¸²) =P(Xn¸c+²) +P(Xn·c¡²) !1¡FX(c+²) +FX(c¡²) = 0 The results follows from the de¯nition of convergence in probability.

Example 1.2.7

LetfXng1n=1is a sequence of independent random variables whereXn» Gamma(®n;¯n), where®nand¯nare sequences of positive real numbers such that®n!® and¯n!¯for some positive real numbers®and¯. Also, let^¯nbe a consistent estimator of¯. We can conclude thatXn=^¯nd!Gamma(®;1).

Example 1.2.8

(t-statistic)LetX1;X2;:::be iid random variables withEX1= 0 and EX

21<1. Then thet-statisticp

n

¹Xn=Sn, whereS2n= (n¡1)¡1Pn

i=1(Xi¡¹Xn)2is the sample variance, is asymptotically standard normal. To see this, ¯rst note that by two applications of WLLN and CMT S 2n=n n¡1Ã 1 n n X i=1X

2i¡¹X2n!

p!1(EX21¡(EX1)2) = Var(X1):

Again, by CMT,Snp!p

Var(X1). By the CLT,p

n

¹Xnd!N(0;Var(X1)). Finally, Slutsky's

Theorem gives that the sequence oft-statistics converges in law toN(0;Var(X1))=p

Var(X1) =

N(0;1).

1.2.3 WLLN and SLLN

We next state some theorems known as thelaws of large numbers. It concerns the limiting behavior of sums of independent random variables. The weak law of large numbers (WLLN) refers to convergence in probability, whereas the strong of large numbers (SLLN) refers to a.s. convergence. Our ¯rst result gives the WLLN and SLLN for a sequence of iid random variables. 13 Theorem 1.2.4LetX1;X2;:::, be iid random variables having a CDFF. (i)The WLLNThe existence of constantsanfor which 1 n n X i=1X i¡anp!0 holds i®limx!1x[1¡F(x) +F(¡x)] = 0, in which case we may choosean=Rn

¡nxdF(x).

(ii)The SLLNThe existence of a constantcfor which 1 n n X i=1X iwp1!c holds i®E[X1]is ¯nite and equals c.

Example 1.2.9

SupposefXig1i=1is a sequence of independent random variables whereXi» t(2). The variance ofXidoes not exist, but Theorem 1.2.4 still applies to this case and we can still therefore conclude that

¹Xnp!0 asn! 1.

The next result is for sequences of independent but not necessarily identically distributed random variables.

Theorem 1.2.5

LetX1;X2;:::, be random variables with ¯nite expectations. (i)The WLLNLetX1;X2;:::, be uncorrelated with means¹1;¹2;:::and variances¾21;¾22;:::.

Iflimn!11

n 2Pn i=1¾2i= 0, then 1 n n X i=1X i¡1 n n X i=1¹ ip!0: (ii)The SLLNLetX1;X2;:::, be independent with means¹1;¹2;:::and variances¾21;¾22;:::. IfP1 i=1¾2i=c2i<1wherecnultimately monotone andcn! 1, then c

¡1nn

X i=1(Xi¡¹i)wp1!0: 14 (iii)The SLLN with common meanLetX1;X2;:::, be independent with common mean

¹and variances¾21;¾22;:::. IfP1

i=1¾¡2 i=1, then n X i=1X i ¾ 2i=nX i=1¾ ¡2 iwp1!¹: A special case of Theorem 1.2.5-(ii) is to setci=iin which we have 1 n n X i=1X i¡1 n n X i=1¹ iwp1!0: The proof of Theorems 1.2.4 and 1.2.5 can be found in Billingsley (1995).

Example 1.2.10

SupposeXiindep»(¹;¾2i). Then, by simple calculus, the BLUE (best linear unbiased estimate) of¹isPn i=1¾¡2 iXi=Pn i=1¾¡2 i. Suppose now that the¾2ido not grow at a rate faster thani; i.e., for some constantK,¾2i·iK. Then,Pn i=1¾¡2 iclearly diverges as n! 1, and so by Theorem 1.2.5-(iii) the BLUE of¹is strongly consistent.

Example 1.2.11

Suppose (Xi;Yi),i= 1;:::;nare iid bivariate samples from some distri- bution withE(X1) =¹1,E(Y1) =¹2, Var(X1) =¾21, Var(Y1) =¾22, and corr(X1;Y1) =½. Letrndenote the sample correlation coe±cient. The almost sure convergence ofrnto½ follow very easily. We write r n=1 n

PXiYi¡¹X¹Y

q ( PX2i n

¡¹X2)(PY2i

n

¡¹Y2);

then from the SLLN for iid random variables (Theorem 1.2.4) and continuous mapping theorem (Theorem 1.2.2; Example 1.2.3-(ii)), r nwp1!E(X1Y1)¡¹1¹2 ¾

21¾22=½:

1.2.4 Characterization of convergence in law

Next we provide a collection of basic facts about convergence in distribution. The following theorems provide methodology for establishing convergence in distribution. 15

Theorem 1.2.6LetX;X1;X2;:::randomp-vectors.

(i) (The Portmanteau Theorem) X nd!Xis equivalent to the following condition: E[g(Xn)]!E[g(X)]for every bounded continuous functiong. (ii) (Levy-Cramer continuity theorem)Let©X,©X1,©X2;:::be the character func- tions ofX;X1;X2;:::, respectively.Xnd!Xi®limn!1©Xn(t) = ©X(t)for allt2Rp. (iii) (Cramer-Wold device) X nd!Xi®cTXnd!cTXfor everyc2Rp. Proof.(i) See Ser°ing (1980), page 16; (ii) Shao (2003), page 57; (iii) AssumecTXnd!cTX for anyc, then by Theorem 1.2.6-(ii) lim n!1©Xn(tc1;:::;tcp) = ©X(tc1;:::;tcp);for allt: Witht= 1, and sincecis arbitrary, it follows by Theorem 1.2.6-(ii) again thatXnd!X. The converse can be proved by a similar argument. [© cTXn(t) = ©Xn(tc) and ©cTX(t) = ©X(tc) for anyt2Rand anyc2Rp.]¤ A straightforward application of Theorem 1.2.6 is that ifXnd!XandYnd!cfor con- stant vectorc, then (Xn;Yn)d!(X;c):

Example 1.2.12

Example 1.1.3 revisited. Consider now the functiong(x) =x10;0· x·1. Note thatgis continuous and bounded. Therefore, by the Portmanteau theorem,

E(g(Xn)) =Pn

i=1i10 n

11!E(g(X)) =R1

0x10dx=1

11 .

Example 1.2.13

Forn¸1;0·p·1, and a given continuous functiong: [0;1]!R, de¯ne the sequence B n(p) =nX k=0g(k n )Cknpk(1¡p)n¡k; which is so-called Bernstein polynomials. Note thatBn(p) =E[g(X n )jX»Bin(n;p)]. As n! 1,X n p!p(WLLN), and it follows thatX n d!±p, the point mass atp. Sincegis continuous and hence bounded (compact interval), it follows from the Portmanteau theorem thatBn(p)!g(p). 16 Example 1.2.14(i) LetX1;:::;Xnbe independent random variables having a common CDF andTn=X1+:::+Xn;n= 1;2;:::. Suppose thatEjX1j<1. It follows from the property of CHF and Taylor expansion that the CHF ofX1satis¯es [@©X(t) @t ]jt=0=p

¡1EX;[@2©X(t)

@t

2]jt=0=¡EX2]

©

X1(t) = ©X1(0) +p

¡1¹t+o(jtj)

asjtj !0, where¹=EX1. Then, it follows that the CHF ofTn=nis ©

Tn=n(t) =·

©

X1µt

n

¶¸

n =· 1 +p

¡1¹t

n +o(jtjn¡1)¸ n for anyt2Rasn! 1. Since (1+cn=n)n!expfcgfor any complex sequencecnsatisfying c n!c, we obtain that ©Tn=n(t)!expfp

¡1¹tg, which is the CHF of the distribution

degenerated at¹. By Theorem 1.2.6-(ii),Tn=nd!¹. From 1.2.6-(i), this also shows that T n=np!¹(an informal proof of WLLN); (ii) Similarly,¹= 0 and¾2= Var(X1)<1imply [second-order Taylor expansion] © Tn=p n (t) =·

1¡¾2t2

2n+o(t2n¡1)¸

n for anyt2Rasn! 1, which implies that ©Tn=p n (t)!expf¡¾2t2=2g, the CHF of

N(0;¾2). Hence,Tn=p

n d!N(0;¾2); (iii) Suppose now thatX1;:::;Xnare randomp-vectors and¹=EX1and§= Cov(X1) are ¯nite. For any ¯xedc2Rp, it follows from the previous discussion that (cTTn¡ncT¹)=p n d!N(0;cT§c). From Theorem 1.2.6-(iii), we conclude that (Tn¡n¹)=p n d!Np(0;§). The following two simple results are frequently useful in calculations.

Theorem 1.2.7(i)

(Prohorov's Theorem)IfXnd!Xfor someX, thenXn=Op(1). (ii) (Polya's Theorem)IfFXn)FXandFXis continuous, then asn! 1, sup

¡1 Proof.(i) For any given" >0, ¯x a constantMsuch thatP(X¸M)< ". By the de¯nition of convergence in law,P(jXnj ¸M) exceedsP(jXj ¸M) arbitrarily small for 17 su±ciently largen. Thus, there existsNsuch thatP(jXnj ¸M)<2", for alln¸N. The results follows from the de¯nition ofOp(1). (ii) Firstly, ¯xk2N. By the continuity ofF there exists points¡1=x0< x1<¢¢¢< xk=1withF(xi) =i=k. By monotonicity, we have, forxi¡1·x·xi, F Xn(x)¡FX(x)·FXn(xi)¡FX(xi¡1) =FXn(xi)¡FX(xi) + 1=k ¸FXn(xi¡1)¡FX(xi) =FXn(xi¡1)¡FX(xi¡1)¡1=k: Thus,FXn(x)¡FX(x) is bounded above by supijFXn(xi)¡FX(xi)j+ 1=k, for everyx. The latter, ¯nite supremum converges to zero because each term converges to zero due to the condition, for each ¯xedk. Becausekis arbitrary, the result follows.¤ The following result can be used to check whetherXnd!XwhenXhas a PDFfand X nhas a PDFfn.

Theorem 1.2.8

(Sche®e Theorem)Letfnbe a sequence of densities of absolutely con- tinuous functions,, withlimnfn(x) =f(x), eachx2Rp. Iffis a density function, then lim nRjfn(x)¡f(x)jdx= 0. Proof.Putgn(x) = [f(x)¡fn(x)]If(x)¸fn(x). By noting thatR[fn(x)¡f(x)]dx= 0, Z jfn(x)¡f(x)jdx= 2Z g n(x)dx: Since 0·gn(x)·f(x) for allx. Hence, by dominated convergence, limnRgn(x)dx= 0. [Dominated convergence theorem. If lim n!1fn=fand there exists an integrable function gsuch thatjfnj ·g, then limnRfn(x)dx=Rlimnfn(x)dxholds]¤ As an example, consider the PDFfnof thet- distributiontn;n= 1;2;:::. One can show (exercise) thatfn!f, wherefis the standard normal PDF. The following result provides a convergence of moments criterion for convergence in law.

Theorem 1.2.9

(Frechet and Shohat Theorem)Let the distribution functionFnpossess ¯nite moments®nk=RtkdFn(t)fork= 1;2;:::andn= 1;2;:::. Assume that the limits ® k= limn®nkexist (¯nite) for eachk. Then, 18 (i)the limits®kare the moments of some a distribution functionF; (ii) if theFgiven by (i) is unique, thenFn)F. [A su±cient condition: the moment sequence®kdetermines the distributionFuniquely if theCarleman conditionP1 i=1®¡1=(2i)

2i=1holds.]

1.2.5 Results onopandOp

There are many rules of calculus withoandOsymbols, which we will apply without com- ment. For instance, o p(1) +op(1) =op(1); op(1) +Op(1) =Op(1); Op(1)op(1) =op(1) (1 +op(1))¡1=Op(1); op(Rn) =Rnop(1); Op(Rn) =RnOp(1); op(Op(1)) =op(1): Two more complicated rules are given by the following lemma.

Lemma 1.2.1

Letgbe a function de¯ned onRpsuch thatg(0) = 0. LetXnbe a sequence of random vectors with values onRthat converges in probability to zero. Then, for every r >0, (i) ifg(t) =o(jjtjjr)ast!0, theng(Xn) =op(jjXnjjr); (ii) ifg(t) =O(jjtjjr)ast!0, theng(Xn) =Op(jjXnjjr). Proof.De¯nef(t) =g(t)=jjtjjrfort6= 0 andf(0) = 0. Theng(Xn) =f(Xn)jjXnjjr. (i) Because the functionfis continuous at zero by assumption,f(Xn)p!f(0) = 0 by

Theorem 1.2.2.

(ii) By assumption there existsMand± >0 such thatjf(t)j ·Mwheneverjjtjj ·±. Thus

P(jf(Xn)j> M)·P(jjXnjj> ±)!0;

and the sequencef(Xn) is bounded.¤ 19

1.3 The central limit theorem

The most fundamental result on convergence in law is the central limit theorem (CLT) for sums of random variables. We ¯rstly state the case of chief importance, iid summands.

De¯nition 1.3.1

A sequence of random variablesXnis asymptotically normal with¹nand ¾

2nif(Xn¡¹n)=¾nd!N(0;1), written byXnisAN(¹n;¾2n).

1.3.1 The CLT for the iid case

Theorem 1.3.1

(Lindeberg-Levy)LetXibe iid with mean¹and ¯nite variance¾2. Then p n

¡¹X¡¹¢

¾ d!N(0;1):

By Slutsky's Theorem, we can write

p n ¡¹X¡¹¢d!N(0;¾2). Also,¹XisAN(¹;¾2=n). See

Billingsley (1995) for a proof.

Example 1.3.1

(Con¯dence intervals)This theorem can be used to approximateP(¹X·

¹+k¾

p n ) by ©(k). This is very useful because the sampling distribution of¹Xis not available except for some special cases. Then, settingk= ©¡1(1¡®) =z®, [¹Xn¡¾=p nz

®;¹Xn+

¾=p

nz ®] is a con¯dence interval for¹of asymptotic level 1¡2®. More precisely, we have that the probability that¹is contained in this interval converges to 1¡2®(how accurate?).

Example 1.3.2

(Sample variance) SupposeX1;:::;Xnare iid with mean¹, variance¾2 andE(X41)<1. Consider the asymptotic distribution ofS2n=1 n¡1P n i=1(Xi¡¹Xn)2. Write p n(S2n¡¾2) =p n à 1 n¡1n X i=1(Xi¡¹)2¡¾2! ¡p n n n¡1(¹Xn¡¹)2: The second term converges to zero in probability and the ¯rst term is asymptotically normal by the CLT. The whole expression is asymptotically normal by the Slutsky' Theorem, i.e., p n(S2n¡¾2)d!N(0;¹4¡¾4); 20 where¹4denotes the centered fourth moment ofX1and¹4¡¾4comes certainly from computing the variance of (X1¡¹)2.

Example 1.3.3

(Level of the Chi-square test) Normal theory prescribes to reject the

null hypothesisH0:¾2·1 for values ofnS2nexceeding the upper®pointÂ2n¡1;®of theÂ2n¡1

distribution. If the observations are sample from a normal distribution, the test has exactly level®. However, this is not approximately the case of the underlying distribution is not normal. The CLT and the Example 1.3.2 yield the following two statements Â

2n¡1¡(n¡1)

p

2(n¡1)d!N(0;1);p

n

µS2n

¾

2¡1¶

d!N(0;·+ 2); where·=¹4=¾4¡3 is the kurtosis of the underlying distribution. The ¯rst statement implies that (Â2n¡1;®¡(n¡1))=p

2(n¡1) converges to the upper®pointz®ofN(0;1).

Thus, the level of the chi-square test satis¯es P

H0(nS2n> Â2n¡1;®) =Pµp

n

µS2n

¾

2¡1¶

>Â2n¡1;®¡n p n ¶ !1¡©Ã z ®p 2 p k+ 2!

So, the asymptotic level reduces to 1¡©(z®) =®i® the kurtosis of the underlying distribution

is 0. If the kurtosis goes to in¯nity, then the asymptotic level approaches to 1¡©(0) = 1=2.

We conclude that the level of the chi-square test is nonrobust against departures of normality that a®ect the value of the kurtosis. If, instead, we would use a normal approximation to the distributionp n(S2n=¾2¡1) the problem would not arise, provided that the asymptotic variance·+ 2 is estimated accurately.

Theorem 1.3.2

(Multivariate CLT for iid case)LetXibe iid randomp-vectors with mean¹and and covariance matrix§. Then p n

¡¹X¡¹¢d!Np(0;§):

Proof.By the Cramer-Wold device, this can be proved by ¯nding the limit distribution of the sequences of real variables c TÃ 1 p n n X i=1(Xi¡¹)! = 1 p n n X i=1(cTXi¡cT¹): 21
Because the random variablescTXi¡cT¹are iid with zero mean and variancecT§c, this sequence isAN(0;cT§c) by Theorem 1.3.1. This is exactly the distribution ofcTXifX possesses anNp(0;§).¤

Example 1.3.4

Suppose thatX1;:::;Xnis a random sample from the Poisson distribution with meanµ. LetZnbe the proportions of zero observed, i.e.,Zn= 1=nPn i=1IfXj=0g. Let us ¯nd the joint asymptotic distribution of ( ¹Xn;Zn). Note thatE(X1) =µ,EIfX1=0g=e¡µ, Var(X1) =µ, Var(IfX1=0g) =e¡µ(1¡e¡µ), andEX1IfX1=0g= 0. So, Cov(X1;IfX1=0g) =

¡µe¡µ. Hence,p

n ¡(¹Xn;Zn)¡(µ;e¡µ)¢d!N2(0;§), where

§=0

@

µ¡µe¡µ

¡µe¡µe¡µ(1¡e¡µ)1

A : It is not as widely known that existence of a variance is not necessary for asymptotic normality of partial sums of iid random variables. A CLT without a ¯nite variance can sometimes be useful. We present the general result below and then give an illustrative example. Feller (1966) contains detailed information on the availability of CLTs without the existence of a variance, along with proofs. First, we need a de¯nition.

De¯nition 1.3.2

A functiong:R!Ris called slowly varying at1if, for everyt >0, lim x!1g(tx)=g(x) = 1. Examples of slowly varying functions are logx,x=(1 +x), and indeed any function with a ¯nite limit asx! 1. But, for example,xore¡xare not slowly varying.

Theorem 1.3.3

LetX1;X2;:::be iid from a CDFFonR. Letv(x) =Rx

¡xy2dF(y). Then,

there exist constantsfang;fbngsuch that P n i=1Xi¡an b nd!N(0;1); if and only ifv(x)is slowly varying at1. 22
IfFhas a ¯nite second moment, then automaticallyv(x) is slowly varying at1. We present an example below where asymptotic normality of the sample partial sums still holds, although the summands do not have a ¯nite variance.

Example 1.3.5

SupposeX1;X2;:::are iid from at-distribution with 2 degrees of freedom (t(2)) that has a ¯nite mean but not a ¯nite variance. The density is given byf(y) = c=(2 +y2)3 2 for some positivec. Hence, by a direct integration, for some other constantk, v(x) =kr 1

2 +x2h

x¡p

2 +x2arcsinh(x=p

2) i : Therefore, on using the fact that arcsinh(x) = log(2x) +O(x¡2) asx! 1, we get, for anyt >0,v(tx) v(x)!1 on some algebra. It follows that for iid observations from at(2) distribution, on suitable centering and normalizing, the partial sumsPn i=1Xiconverge to a normal distribution, although theXi's do not have a ¯nite variance. The centering can be taken to be zero for the centeredt-distribution; it can be shown that the normalizing required isbn=p nlogn(why?).

1.3.2 The CLT for the independent not necessarily iid case

Theorem 1.3.4

(Lindeberg-Feller)SupposeXnis a sequence of independent variables with means¹nand variances¾2n<1. Lets2n=Pn i=1¾2i. If for any² >0 1 s 2nn X j=1Z jx¡¹jj>²sn(x¡¹j)2dFj(x)!0;(1.2) whereFiis the CDF ofXi, then n P i=1(Xi¡¹i) s nd!N(0;1): A proof can be seen on page 67 in Shao (2003). The condition (1.2) is called Lindeberg-Feller condition. 23
Example 1.3.6LetX1;X2:::;be independent variables such thatXjhas the uniform distribution on [¡j;j];j= 1;2;:::. Let us verify the conditions of Theorem 1.3.4 are satis¯ed.

Note thatEXj= 0 and¾2j=1

2jR j

¡jx2dx=j2=3 for allj. Hence,

s 2n=nX j=1¾ 2j=1 3 n X j=1j

2=n(n+ 1)(2n+ 1)

18 : For any² >0,n < ²snfor su±ciently largen, since limnn=sn= 0. BecausejXjj ·j·n, whennis su±ciently large,

E(X2jIfjXjj>²sng) = 0:

Consequently, lim

n!1Pn j=1E(X2jIfjXjj>²sng)<1. Consideringsn! 1, Lindeberg's con- dition holds. The Lindeberg- Feller theorem is a landmark theorem in probability and statistics. Gen- erally, it is hard to verify the Lindeberg-Feller condition. A simpler theorem is the following.

Theorem 1.3.5

(Liapounov)SupposeXnis a sequence of independent variables with means¹nand variances¾2n<1. Lets2n=Pn i=1¾2i. If for some± >0 1 s

2+±nn

X j=1EjXj¡¹jj2+±!0 (1.3) asn! 1, then n P i=1(Xi¡¹i) s nd!N(0;1): A proof is given in Sen and Singer (1993). For instance, ifsn! 1, supj¸1EjXj¡¹jj2+±<1 andn¡1snis bounded, then the condition of Liapounov's theorem is satis¯ed. In practice, usually one tries to work with±= 1 or 2 for algebraic convenience. It can be easily checked that ifXiis uniformly bounded andsn! 1, the condition is immediately satis¯ed with

±= 1.

Example 1.3.7

LetX1;X2;:::be independent random variables. Suppose thatXihas the binomial distribution BIN(pi;1);i= 1;2;:::. For eachi,EXi=piandEjXi¡EXij3= 24
(1¡pi)3pi+p3i(1¡pi)·2pi(1¡pi). Hence,Pn i=1EjXi¡EXij3·2s2n= 2Pn i=1EjXi¡ EX ij2= 2Pn i=1pi(1¡pi). Then Liapounov's condition (1.3) holds with±= 1 ifsn! 1. For example, ifpi= 1=iorM1·pi·M2with two constants belong to (0;1),sn! 1 holds. Accordingly, by Liapounov's theorem,P n i=1(Xi¡pi) s nd!N(0;1): A consequence especially useful in regression is the following theorem, which is also proved in Sen and Singer (1993).

Theorem 1.3.6

(Hajek-Sidak)SupposeX1;X2;:::are iid random variables with mean¹ and variance¾2<1. Letcn= (cn1;cn2;:::;cnn)be a vector of constants such that max

1·i·nc

2ni n P j=1c2nj!0 (1.4) asn! 1. Then nP i=1c ni(Xi¡¹) ¾ s n P j=1c2njd !N(0;1): The condition (1.4) is to ensure that no coe±cient dominates the vectorcn, and is referred as Hajek-Sidak condition in the literatures. For example, ifcn= (1;0;:::;0), then the condition would fail and so would the theorem. The Hajek-Sidak's theorem has many applications, including in the regression problem. Here is an important example.

Example 1.3.8

(Simplest linear regression) Consider the simple linear regression model y i=¯0+¯1xi+"i, where"i's are iid with mean 0 and variance¾2but are not necessarily normally distributed. The least squares estimate of¯1based onnobservations is b

¯1=P

n i=1(yi¡¹yn)(xi¡¹xn) P n i=1(xi¡¹xn)2=¯1+P n i=1"i(xi¡¹xn) P n i=1(xi¡¹xn)2: So, b¯1=¯1+Pn i=1"icni=Pn j=1c2nj, wherecni=xi¡¹xn. Hence, by the Hajek-Sidak's

Theoremvuut

n X j=1c

2njb¯1¡¯1

¾ =P n i=1"icni ¾ q P n j=1c2njd !N(0;1); 25
provided max

1·i·n(xi¡¹xn)2

P n j=1(xj¡¹xn)2!0 asn! 1. For most reasonable designs, this condition is satis¯ed. Thus, the asymptotic normality of the LSE (least squares estimate) is established under some conditions on the design variables, an important result.

Theorem 1.3.7

(Lindeberg-Feller multivariate)SupposeXiis a sequence of indepen- dent vectors with means¹i, covariances§iand distribution functionFi. Suppose that 1 n P n i=1§i!§asn! 1, and that for any² >0 1 n n X j=1Z jjx¡¹jjj>²p n jjx¡¹jjj2dFj(x)!0; then 1 p n n X i=1(Xi¡¹i)d!N(0;§):

Example 1.3.9

(multiple regression)In the linear regression problem, we observe a vectory=X¯+"for a ¯xed or random matrixXof full rank, and an error vector" with iid components with mean zero and variance¾2. The least squares estimator of¯is b ¯= (XTX)¡1XTy. This estimator is unbiased and has covariance matrix¾2(XTX)¡1. If the error vector"is normally distributed, thenb¯is exactly normally distributed. Under reasonable conditions on the design matrix, b¯is asymptotically normally distributed for a large range of error distributions. Here we ¯xpand letntend to in¯nity. This follows from the representation (XTX)1=2(b¯¡¯) = (XTX)¡1=2XT"=nX i=1a ni"i; wherean1;:::;annare the columns of the (p£n) matrix (XTX)¡1=2XT=:A. This sequence is asymptotically normal if the vectorsan1"1;:::;ann"nsatisfy the Lindeberg conditions. The norming matrix (XTX)1=2has been chosen to ensure that the vectors in the display have covariance matrix¾2Ipfor everyn. The remaining condition is n X i=1jjanijj2E"2iIfjjanijjj"ij>²g!0: 26
This can be simpli¯ed to other conditions in several ways. BecausePjjanijj2= tr(AAT) =p, it su±ces that max iE"2iIfjjanijjj"ij>²g!0, which is also equivalent to maxijjanijj !0. Al-

ternatively, the expectationE"2iIfjjanijjj"ij>²gcan be bounded²¡kEj"ijk+2jjanijjkand a second

set of su±cient conditions is n X i=1jjanijjk!0;Ej"1jk<1; k >2:

1.3.3 CLT for a random number of summands

The canonical CLT for the iid case says that ifX1;X2;:::are iid with mean zero and a ¯nite variance¾2, then the sequence of partial sumsTn=Pn i=1Xiobeys the central limit theorem in the sense Tn ¾ p n d!N(0;1). There are some practical problems that arise in applications, for example in sequential statistical analysis, where the number of terms present in a partial sum is a random variable. Precisely,fN(t)g;t¸0, is a family of (nonnegative) integer-valued random variables, and we want to approximate the distribution ofTN(t), where for each ¯xed n,Tnis still the sum ofniid variables as above. The question is whether a CLT still holds under appropriate conditions. Here is the Anscombe-Renyi theorem.

Theorem 1.3.8

(Anscombe-Renyi)LetXibe iid with mean¹and a ¯nite variance¾2, and letfNng, be a sequence of (nonnegative) integer-valued random variables andfanga sequence of positive constants tending to1such thatNn=anp!c;0< c <1, asn! 1. Then, T

Nn¡Nn¹

¾ p N nd!N(0;1)asn! 1:

Example 1.3.10

(coupon collection problem)Consider a problem in which a person keeps purchasing boxes of cereals until she obtains a full set of somencoupons. The as- sumptions are that the boxes have an equal probability of containing any of thencoupons mutually independently. Suppose that the costs of buying the cereal boxes are iid with some mean¹and some variance¾2. If it takesNnboxes to obtain the complete set of all ncoupons, thenNn=(nlnn)p!1 asn! 1The total cost to the customer to obtain the 27
complete set of coupons isTNn=X1+:::+XNn. By the Anscombe-Renyi theorem and

Slutsky's theorem, we have that

TNn¡Nn¹

¾ pnlnnis approximatelyN(0;1). [On the distribution ofNn. Lettibe the boxes to collect thei-th coupon afteri¡1 coupons have been collected. Observe that the probability of collecting a new coupon given i¡1 coupons ispi= (n¡i+1)=n. Therefore,tihas a geometric distribution with expectation

1=piandNn=Pn

i=1ti. By Theorem 1.2.5, we know 1 nlnnNnp!1 nlnnn X i=1p ¡1 i=1 nlnnn X i=1n1 i =1 lnnn X i=11 i =:1 lnnHn: Note thatHnis the harmonic number and hence by using the asymptotics of the harmonic numbers (Hn= lnn+°+o(1);°is Euler-constant), we obtainNn nlnn!1:]

1.3.4 Central limit theorems for dependent sequences

The assumption that observed dataX1;X2;:::form an independent sequence is often one of technical convenience. Real data frequently exhibit some dependence and at the least some correlation at small lags. Exact sampling distributions for ¯xednare even more complicated for dependent data than in the independent case, and so asymptotics remain useful. In this subsection, we present CLTs for some important dependence structures. The cases of stationarym-dependence and without replacement sampling are considered.

Stationarym-dependence

We start with an example to illustrate that a CLT for sample means can hold even if the summands are not independent.

Example 1.3.11

SupposeX1;X2;:::is a stationary Gaussian sequence withE(Xi) =¹,

Var(Xi) =¾2<1. Then, for eachn,p

n(¹Xn¡¹) is normally distributed and sop n(¹Xn¡

¹)d!N(0;¿2), provided¿2= limn!1Var(p

n(¹Xn¡¹))<1. But Var( p n(¹Xn¡¹)) =¾2+1 n X i6=jCov(Xi;Xj) =¾2+2 n n X i=1(n¡i)°i; 28
where°i= Cov(X1;Xi+1). Therefore,¿2<1if and only if1 nn P i=1(n¡i)°ihas a ¯nite limit, say½, in which casep n(¹Xn¡¹)d!N(0;¾2+½).

What is going on qualitatively is that

1 n n P i=1(n¡i)°iis summable whenj°ij !0 adequately

fast. Instances of this are when only a ¯xed ¯nite number of the°iare nonzero or when°iis

damped exponentially; i.e.,°i=O(ai) for somejaj<1. It turns out that there are general CLTs for sample averages under such conditions. The case ofm-dependence is provided below.

De¯nition 1.3.3

A stationary sequencefXngis calledm-dependent for a given ¯xedmif (X1;:::;Xi)and(Xj;Xj+1;:::)are independent wheneverj¡i > m.

Theorem 1.3.9

(m-dependent sequence)LetfXigbe a stationarym-dependent se- quence. LetE(Xi) =¹andVar(Xi) =¾2<1. Thenp n(¹Xn¡¹)d!N(0;¿2), where ¿

2=¾2+ 2Pm+1

i=2Cov(X1;Xi). See Lehmann (1999) for a proof;m-dependent data arise either as standard time series models or as models in their own right. For example, iffZigare i.i.d. random variables andXi=a1Zi¡1+a2Zi¡2;i¸3, thenfXigis 1-dependent. This is a simple moving average process of use in time series analysis. A more generalm-dependent sequence is X i=h(Zi;Zi+1;:::;Zi+m) for some functionh.

Example 1.3.12

SupposeZiare i.i.d. with a ¯nite variance¾2, and letXi= (Zi+Zi+1)=2.

Then, obviouslyPn

i=1Xi=Z1+Zn+1 2 +Pn i=2Zi. Then, by Slutsky's theorem,p n(¹Xn¡

¹)d!N(0;¾2). Notice we writep

n(¹Xn¡¹) into two parts in which one part is dominant and produces the CLT, and the other part is asymptotically negligible. This is essentially the method of proof of the CLT for more generalm-dependent sequences.

Sampling without replacement

Dependent data also naturally arise in sampling without replacement from a ¯nite popula- tion. Central limit theorems are available and we will present them shortly. But let us start 29
with an illustrative example.

Example 1.3.13

Suppose, amongNobjects in a population,Dare of type 1 andN¡D of type 2. A sample without replacement of sizenis taken, and letXbe the number of sampled units of type 1. We can regard theseDtype 1 units as having numerical values X

1;:::;XD= 1 and the rest as having valuesXD+1;:::;XN= 0,X=Pn

i=1XNi, where X

N1;:::;XNncorrespond to the sampled units.

Of course,Xhas the hypergeometric distribution

P(X=x) =CxDCn¡x

N¡D

C nN;0·x·D: Two con¯gurations can be thought of: (a)nis ¯xed, andD=N!p, 0< p <1 withN! 1. In this case, by applying Stirlings approximation toN! andD!,P(X=x)!Cxnpx(1¡p)x, and soXd!Bin(n;p); (b)n;N;N¡n! 1,D=N!p, 0< p <1. This is the case where convergence ofXto normality holds. Here is a general result; again, see Lehmann (1999) for a proof.

Theorem 1.3.10

ForN¸1, let¼Nbe a ¯nite population with numerical valuesX1;X2;:::XN. LetXN1;XN2;:::;XNnbe the values of the units of a sample without replacement of sizen. Let

¹Xn=Pn

i=1XNi=nand¹XN=PN i=1XN=N. Supposen;N¡n! 1, and (a) max

1·i·N(Xi¡¹XN)2

N P i=1(Xi¡¹XN)2!0; andn=N!0< ¿ <1asN! 1; (b)

Nmax1·i·N(Xi¡¹XN)2

N P i=1(Xi¡¹XN)2=O(1);asN! 1: Then,

¹Xn¡E(¹Xn)

p Var(

¹Xn)d!N(0;1):

30
Example 1.3.14SupposeXN1;:::;XNnis a sample without replacement from the set f1;2;:::;Ng, and let¹Xn=Pn i=1XNi=n. Then, by a direct calculation,

E(¹Xn) =N+ 1

2 ;Var(¹Xn) =(N¡n)(N+ 1) 12n:

Furthermore,

Nmax1·i·N(Xi¡¹XN)2

N P i=1(Xi¡¹Xn)2=3(N¡1)

N+ 1=O(1):

Hence, by Theorem 1.3.10,

¹Xn¡E(¹Xn)

p Var

¹Xnd!N(0;1):

1.3.5 Accuracy of CLT

Suppose a sequence of CDFsFXnd!FXfor someFX. Such a weak convergence result is usually used to approximate the true value ofFXn(x) at some ¯xednandxbyFX(x). However, the weak convergence result by itself says absolutely nothing about the accuracy of approximatingFXn(x) byFX(x) for that particular value ofn. To approximateFXn(x) by F X(x) for a given ¯nitenis a leap of faith unless we have some idea of the error committed; i.e.,jFXn(x)¡FX(x)j. More speci¯cally, if for a sequence of random variablesX1;:::;Xn ¹

Xn¡E(¹Xn)

p Var(

¹Xn)d!Z»N(0;1);

then we need some idea of the error

¯¯¯¯¯PùXn¡E(¹Xn)

p Var(

¹Xn)·x!

¡©(x)¯

¯¯¯¯:

in order to use the central limit theorem for a practical approximation with some degree of con¯dence. The ¯rst result for the iid case in this direction is the classic Berry-Esseen theorem. Typically, these accuracy measures give bounds on the error in the appropriate CLT for any ¯xedn, making assumptions about moments ofXi. In the canonical iid case with a ¯nite variance, the CLT says that p n(¹X¡¹)=¾converges in law to theN(0;1). By Polya's theorem, the uniform error ¢n= sup¡1 The following results are the classic Berry-Esseen uniform bound and an extension of the Berry-Esseen inequality to the case of independent but not iid variables.; a proof can be seen in Petrov (1975). Introducing higher-order moment assumptions (third), the Berry-Esseen inequality assert for this convergence the rateO(n¡1=2).

Theorem 1.3.11

(i)(Berry-Esseen; iid case)LetX1;:::;Xnbe iid withE(X1) =¹, Var(X1) =¾2, and¯3=EjX1¡¹j3<1. Then there exists a universal constantC, not depending onnor the distribution of theXi, such that sup x¯

¯¯¯Pµ

p n(¹Xn¡¹) ¾

·x¶

¡©(x)¯¯¯¯·C¯3

¾ 3p n : (ii)(independent but not iid case)LetX1;:::;Xnbe independent withE(Xi) =¹i, Var(Xi) =¾2i, and¯3i=EjXi¡¹ij3<1. Then there exists a universal constantC¤, not depending onnor the distribution of theXi, such that sup x¯

¯¯¯¯PùXn¡E(¹Xn)

p Var(

¹Xn)·x!

¡©(x)¯

¯¯¯¯·C¤Pn

i=1¯3i ( Pn i=1¾2i)3=2: It is the best possible rate in the sense of not being subject to improvement without narrowing the class of distribution functions considered. For some speci¯c underlying CDFsFX, better rates of convergence in the CLT may be possible. This issue will be clearer when we discuss asymptotic expansions forP(p n(¹Xn¡¹)=¾·x). In Theorem 1.3.11-(i), the universal constantCmay be taken asC= 0:8.

Example 1.3.15

The Berry-Esseen bound is uniform inx, and it is valid for anyn¸1. While these are positive features of the theorem, it may not be possible to establish that ¢ n·²for some preassigned² >0 by using the Berry-Esseen theorem unlessnis very large. Let us see an illustrative example. SupposeX1;:::;Xniid»BIN(p;1) andn= 100. Suppose we want the CLT approximation to be accurate to within an error of ¢ n= 0:005. In the Bernoulli case,¯3=pq(1¡2pq), whereq= 1¡p. UsingC= 0:8, the uniform Berry-Esseen bound is ¢ n·0:8pq(1¡2pq) (pq)3=2p n : 32

This is less than the prescribed ¢

n= 0:005 i®pq >0:4784, which does not hold for any

0< p <1. Even forp= 0:5, the bound is less than or equal to ¢n= 0:005 only when

n >25;000, which is a very large sample size. Of course, this is not necessarily a °aw of the Berry-Esseen inequality itself because the desire to have a uniform error of at most ¢ n= 0:005 is a tough demand, and a fairly large value ofnis probably needed to have such a small error in the CLT.

Example 1.3.16

As an example of independent variables that are not iid, considerXi»

BIN(i¡1;1);i¸1, and letSn=Pn

i=1Xi. Then,E(Sn) =Pn i=1i¡1, Var(Sn) =Pn i=1(i¡1)=i2 and¯3i= (i¡1)(i2¡2i+ 2)=i4. Therefore, from Theorem 1.3.11-(ii), ¢ n·C¤P n i=1(i¡1)(i2¡2i+ 2)=i4 P n i=1[(i¡1)=i2]3=2

Observe now

Pn i=1(i¡1)=i2= logn+O(1) andPn i=1(i¡1)(i2¡2i+ 2)=i4= logn+O(1). Substituting these back into the Berry-Esseen bound, one obtains with some minor algebra that ¢ n=O(logn)¡1=2. Forxsu±ciently large, whilenremains ¯xed, the quantitiesFXn(x) andFX(x)each become so close to 1 that the bound given in Theorem 1.3.11 is too rude. There has been a parallel development on developing bounds on the error in the CLT at a particularxas opposed to bounds on the uniform error. Such bounds are called local Berry-Esseen bounds. Many di®erent types of local bounds are available.We present here just one.

Theorem 1.3.12

LetX1;:::;Xnbe independent withE(Xi) =¹i,Var(Xi) =¾2i, and

EjXi¡¹ij2+±<1for some0< ±·1. Then

¯¯¯¯¯PùXn¡E(¹Xn)

p Var(

¹Xn)·x!

¡©(x)¯

¯¯¯¯·D

1 +jxj2+±P

n i=1EjXi¡¹ij2+± ( Pn i=1¾2i)1+± 2 : for some universal constant0< D <1. Such local bounds are useful in proving convergence of global error criteria such as RjFXn(x)¡©(x)jpdxor for establishing approximations to the moments ofFXn. Uniform error bounds would be useless for these purposes. If the third absolute moments are ¯nite, 33
an explicit value for the universal constantDcan be chosen to be 31. Good reference for local bounds is Ser°ing (1980). Error bounds for normal approximations to many other types of statistics besides sample means are known, such as the result for statistics that are smooth functions of means. The order of the error depends on the conditions one assumes on the nature of the function. We will discuss this problem in Section 2 after we introduced the Delta method.

1.3.6 Edgeworth and Cornish-Fisher expansions

We now consider the important topic of writing asymptotic expansions for the CDFs of centered and normalized statistics. When the statistic is a sample mean, letZn=p n(¹Xn¡ ¹)=¾andFZn(x) =P(Zn·x), whereX1;:::;Xnare i.i.d with a CDFFhaving mean¹ and variance¾2<1. The CLT says thatFZn(x)!©(x) for everyx, and the Berry-Esseen theorem says jFZn(x)¡©(x)j=O(n¡1=2) uniformly inxifXhas three moments. If we change the approximation ©(x) to ©(x) +C1(F)p1(x)Á(x)=p nfor some suitable constantC1(F) and a suitable polynomialp1(x), we can assert that jFn(x)¡©(x)¡C1(F)p1(x)Á(x) p n j=O(n¡1); uniformly inx. Expansions of the form F n(x) = ©(x) +kX s=1q s(x) p n s+o(n¡k=2) uniformly inx; are known as Edgeworth expansions forZn. One needs some conditions onFand enough moments ofXto carry the expansion tokterms for a givenk. Excellent references for the main results on Edgeworth expansions are Hall (1992). The coe±cients in the Edgeworth expansion for means depend on the cumulants ofF, which share a functional relationship with the sequence of moments ofF. Cumulants are also useful in many other contexts, for example, the saddlepoint approximation. We start with the de¯nition and recursive representations of the sequence of cumulants of a distribution. The term cumulant was coined by Fisher (1931). 34
De¯nition 1.3.4LetX»Fhave a ¯nite m.g.f.Ãn(t)in some neighborhood of zero, and letK(t) = logÃn(t)when it exists. Therth cumulant ofX(or ofF) is de¯ned as · r=dr dt rK(t)jt=0. Equivalently, the cumulants ofXare the coe±cients in the power series expansionK(t) = 1P n=1· ntn n!within the radius of convergence ofK(t). By equating coe±cients ineK(t)with

those inÃ(t), it is easy to express the ¯rst few moments (and therefore the ¯rst few central

moments) in terms of the cumulants. Indeed, lettingci=E(Xi),¹i=E(X¡¹)i, one obtains the expressions c

1=·1; c2=·2+·21; c3=·3+ 3·1·2+·31; c4=·4+ 4·1·3+ 3·22+ 6·21·2+·41

¹

2=¾2=·2; ¹3=·3; ¹4=·4+ 3·22:

In general, the cumulants satisfy the recursion relations · n=cn¡n¡1X j=1C j¡1 n¡1cn¡j·j; which results in ·

1=¹; ·2=¾2; ·3=¹3; ·4=¹4¡3¹22:

The higher-order ones are quite complex but can be found from Kendall'sAdvanced Theory of Statistics.

Example 1.3.17

SupposeX»N(¹;¾2). Of course,·1=¹,·2=¾2. SinceK(t) = t¹+t2¾2=2, a quadratic, all derivatives ofK(t) of order higher than 2 vanish. Consequently, · r= 0 forr >2. IfX»Poisson(¸), thenK(t) =¸(et¡1), and therefore all derivatives of K(t) are equal to¸et. It follows that·r=¸forr¸1. These are two interesting special cases with neat structure and have served as the basis for stochastic modeling. Now let us consider the expansion for (function of) means. To illustrate the idea, let us considerZn. Assume that the m.g.f ofW= (X1¡¹)=¾is ¯nite and positive in a neighborhood of 0. The m.g.f ofZnis equal to à n(t) =£expfK(t=p n)g¤n= exp( t 2 2 +1X j=3· jtj j!n(j¡2)=2) ; 35
whereK(t) is the cumulant generating function ofWand·j's are the corresponding cumu- lants (·1= 0;·2= 1;·3=EW3and·4=EW4¡3). Using the series expansion foret2=2, we obtain that à n(t) =et2=2+n¡1=2r1(t)et2=2+¢¢¢+n¡j=2rj(t)et2=2+¢¢¢;(1.5) whererjis a polynomial of degrees 3jdepending on·3;:::;·j+2but not onn,j= 1;2;:::.

For example, it can be shown that

r

1(t) =1

6

·3t3; r2(t) =1

24

·4t4+1

72

·23t6:

SinceÃn(t) =RetxdFZn(x) andet2=2=Retxd©(x), expansions (1.5) suggests the inverse expansion F Zn(x) = ©(x) +n¡1=2R1(x) +¢¢¢+n¡j=2Rj(x) +¢¢¢; whereRj(x) is a function satisfyingRetxdRj(x) =rj(t)et2=2;j= 1;2;:::. Thus,Rj's can be obtained oncerj's are derived. For example, R

1(x) =¡1

6

·3(x2¡1)Á(x)

R

2(x) =¡·1

24

·4(x2¡3) +1

72

·23x(x4¡10x2+ 15)¸

Á(x)

The CLT for means fails to capture possible skewness in the distribution of the mean for a given ¯nite n because all normal distributions are symmetric. By expanding the CDF to the next term, the skewness can be captured. Expansion to another term also adjusts for the kurtosis. Although expansions to any number of terms are available under existence of enough moments, usually an expansion to two terms after the leading term is of the most practical importance. Indeed, expansions to three terms or more can be unstable due to the presence of the polynomials in the expansions. We present the two-term expansion next. A rigorous statement of the Edgeworth expansion for a more generalZnwill be introduced in the next chapter after entailing the multivariate Delta theorem. The proof can be found in

Hall (1992).

Theorem 1.3.13

(Two-term Edgeworth expansion)SupposeFis absolutely continu- ous distributions andEF(X4)<1. Then F

Zn(x) = ©(x) +C1(F)p1(x)Á(x)

p n +C2(F)p2(x) +C3(F)p3(x) n +O(n¡3=2); 36
uniformly inx, where C

1(F) =E(X¡¹)3

6¾3; C2(F) =E(X¡¹)4

¾

4¡3

24
; C3(F) =C21(F) 72
; p

1(x) = 1¡x2; p2(x) = 3x¡x3; p3(x) = 10x3¡15x¡x5:

Note that the termsC1(F) andC2(F) can be viewed as skewness and kurtosis correction of departure from normality forFZn(x), respectively. It is useful to mention here that the corresponding formal two-term expansion for the density ofZnis given by

Á(z)+n¡1=2C1(F)(z3¡3z)Á(z)+n¡1[C3(F)(z6¡15z4+45z2¡15)+C2(F)(z4¡6z2+3)]Á(z):

One of the uses of an Edgeworth expansion in statistics is approximation of the power of a test. In the one-parameter regular exponential family, the natural su±cient statistic is a sample mean, and standard tests are based on this statistic. So the Edgeworth expansion for sample means of iid random variables can be used to approximate the power of such tests.

Here is an example.

Example 1.3.18

SupposeX1;:::;Xniid»Exp(¸) and we wish to testH0:¸= 1 vs.H1: ¸ >1. The UMP test rejectsH0for large values ofPn i=1Xi. If the cuto® value is found by using the CLT, then the test rejectsH0for¹Xn>1+k=p n, wherek=z®. The power at an alternative¸equals

Power =P¸¡¹Xn>1 +k=p

n

¢=P¸µ

¹Xn¡¸

¸= p n >1 +k=p n¡¸ ¸= p n ¶ = 1¡P¸µ

¹Xn¡¸

¸= p n ·p n(1¡¸) ¸ +k ¸ ¶ !1: For a more useful approximation, the Edgeworth expansion is used. For example, the general one-term Edgeworth expansion for sample means F n(x) = ©(x) +C1(F)(1¡x2)Á(x) p n +O(n¡1); can be used to approximate the power expression above. Algebra reduces the one-term

Edgeworth expression to the formal approximation

Power¼©µ

p n(¸¡1)¡k ¸ ¶ +1 3 p n · (p n(¸¡1)¡k)2 ¸

2¡1¸

Áµ

p n(¸¡1)¡k ¸ ¶ : 37
This is a much more useful approximation than simply saying that for largenthe power is close to 1. For constructing asymptotically correct con¯dence intervals for a parameter on the basis of an asymptotically normal statistic, the ¯rst-order approximation to the quantiles of the statistic (suitably centered and normalized) comes from using the central limit theorem. Just as Edgeworth expansions produce more accurate expansions for the CDF of the statistic than does just the central limit theorem, higher-order expansions for the quantiles produce more accurate approximations than does just the normal quantile. These higher-order expansions for quantiles are essentially obtained from recursively inverted Edgeworth expansions, start- ing with the normal quantile as the initial approximation. They are calledCornish-Fisher expansions. We brie°y present the case of sample means. Let the standardized cumulants are the quantities½r=·r=¾r.

Theorem 1.3.14

LetX1;:::;Xnbe i.i.d with absolutely continuous CDFFhaving a ¯nite m.g.f in some open neighborhood of zero. LetZn=p n(¹Xn¡¹)=¾andHn(x) =PF(Zn·x). Then, H

¡1n(®) =z®+(z2®¡1)½3

6 p n +(z3®¡3z®)½4

24n¡(2z3®¡5z®)½23

36n+O(n¡3=2):

Using Taylor's expansions atz®for ©(wn®),p1(wn®)Á(wn®) andp2(wn®)Á(wn®), and the fact

thatÁ0(x) =¡xÁ(x), we can obtain this theorem by inverting the Edgeworth expansion.

Example 1.3.19

LetWn»Â2nandZn= (Wn¡n)=p

2nd!N(0;1) asn! 1, so a ¯rst-

order approximation to the upper®th quantile ofWnis justn+z®p

2n. The Cornish-Fisher

expansion should produce a more accurate approximation. To verify this, we will need the standardized cumulants, which are½3= 2p

2 and½4= 12. Now substituting into the theorem

above, we get the two-term Cornish-Fisher expansionÂ2n;®=n+z®p 2n+2 3 (z2®¡1)+z3®¡7z® 9 p 2n. 38

1.3.7 The law of the iterated logarithm

The law of the iterated logarithm (LIL) complements the CLT by describing the precise extremes of the °uctuations of the sequence of random variables P n i=1(Xi¡¹) ¾n

1=2; n= 1;2;::::

The CLT states that this sequence converges in law toN(0;1), but does not otherwise provide information about the °uctuations of these random variables about the expected value 0. The LIL asserts that the extremes °uctuations of this sequence are essentially of the exact order of magnitude (2loglogn)1=2. The classical iid case is covered by

Theorem 1.3.15

(Hartman and Wintner). letfXigbe iid with mean¹and ¯nite vari- ance¾2. Then lim sup n!1P n i=1(Xi¡¹) (2¾2nloglogn)1=2= 1 wp1; lim inf n!1P n i=1(Xi¡¹) (2¾2nloglogn)1=2=¡1 wp1: In other words: with probability 1, for any² >0, only ¯nitely many of the events P n i=1(Xi¡¹) (2¾2nloglogn)1=2>1 +²; n= 1;2;:::; P n i=1(Xi¡¹) (2¾2nloglogn)1=2>¡1¡²; n= 1;2;::: are realized, whereas in¯nitely many of the events P n i=1(Xi¡¹) (2¾2nloglogn)1=2>1¡²; n= 1;2;:::; P n i=1(Xi¡¹) (2¾2nloglogn)1=2>¡1 +²; n= 1;2;:::; occur. That is, with probability 1, for any² >0, all but ¯nitely many of these °uc- tuations fall within the boundaries§(1 +²)(2loglogn)1=2and moreover, the boundaries §(1¡²)(2loglogn)1=2are reached in¯nitely often. In LIL theorem, what is going on is that, for a givenn, there is some collection of sample points!for which the partial sumSn¡n¹stays in a speci¯cp n-neighborhood of zero. 39
But this collection keeps changing with changingn, and any particular!is sometimes in the collection and at other times out of it. Such unlucky values ofnare unbounded, giving rise to the LIL phenomenon. The exact ratep nloglognis a technical aspect and cannot be explained intuitively. The LIL also complements-indeed, re¯nes- the SLLN (assuming existence of 2nd mo- ments). It terms of the average dealt with the SLLN, 1 n n P i=1X i¡¹, the LIL assert that the extreme °uctuations are essentially of the exact order of magnitude

¾(2loglogn)1=2

n 1=2: Thus, with provability 1, for any² >0, the in¯nite sequence of \con¯dence intervals" ( 1 n n X i=1X i§(1 +²)¾(2loglogn)1=2 n 1=2) contains¹with only ¯nitely many exceptions. Say, in this asymptotic fashion, the LIL provides the basis for concepts of 100% con¯dence intervals. The LIL also provides an example of almost sure convergence being truly stronger than convergence in probability.

Example 1.3.20

LetX1;X2;:::be iid with a ¯nite variance. Then, S n¡n¹ p

2nloglogn=Sn¡n¹

p n 1 p

2loglogn=Op(1)¢o(1) =op(1):

But, by the LIL,

Sn¡n¹

p

2nloglogndoes not converge a.s. to zero. Hence, convergence in probability

is weaker than almost sure convergence, in general.

References

Billingsley, P. (1995).Probability and Measure, 3rd edition, John Wiley, New York.

Petrov, V. (1975).Limit Theorems for Sums of Independent Random Variables(translation from Russian),Springer-Verlag, New York.

Ser°ing, R. (1980).Approximation Theorems of Mathematical Statistics, John Wiley, New York. Shao, J. (2003).Mathematical Statistics, 2nd ed. Springer, New York. Van der Vaart, A. W. (2000).Asymptotic Statistics, Cambridge University Press. 40

Chapter 2

Transformations of given statistics:

The delta method

Distributions of transformations of a statistic are of importance in applications. Suppose an estimatorTnfor a parameterµis available, but the quantity of interest isg(µ) for some known functiong. A natural estimator isg(Tn). The aim is to deduce the asymptotic behavior of g(Tn) based on those ofTn. A ¯rst result is an immediate consequence of the continuous-mapping theorem. Of greater interest is a similar question concerning limit distributions. In particular, ifp n(Tn¡ µ) converges in law to a limit distribution, is the same true forp n[g(Tn)¡g(µ)]? Ifgis di®erentiable, then the answer is a±rmative.

2.1 Basic result

The delta theorem says how to approximate the distribution of a transformation of a statistic in large samples if we can approximate the distribution of the statistic itself. We ¯rstly treat the univariate case and present the basic delta theorem as follows. 41
Theorem 2.1.1 (Delta Theorem)LetTnbe a sequence of statistics such that p n(Tn¡µ)d!N(0;¾2(µ)):(2.1) Letg:R!Rbe once di®erentiable atµwithg0(µ)6= 0. Then p n[g(Tn)¡g(µ)]d!N(0;[g0(µ)]2¾2(µ)): Proof.First note that it follows from the assumed CLT forTnthatTnconverges in prob- ability toµand henceTn¡µ=op(1). The proof of the theorem now follows from a simple application of Taylor's theorem that says that g(x0+h) =g(x0) +hg0(x0) +o(h) ifgis di®erentiable atx0. Therefore g(Tn) =g(µ) + (Tn¡µ)g0(µ) +op(Tn¡µ): That the remainde
Politique de confidentialité -Privacy policy