Introduction to the Bootstrap PDF An introduction to the bootstrap/

Introduction to the Bootstrap

An introduction to the bootstrap/Brad Efron Rob Tibshirani. p. em. Includes bibliographical references. ISBN 0-412-04231-2. 1. Bootstrap (Statistics).

An Introduction to Bootstrap

An introduction to the bootstrap I Brad Efron Rob Tibshirani. p. em. The bootstrap estimate of standard error

An Introduction to the Bootstrap

to the. Bootstrap. Bradley Efron. Department of Statistics. Stanford University and. Robert J. Tibshirani 6.2 The bootstrap estimate of standard error.

Package bootstrap

Description Software (bootstrap cross-validation

Introduction to the Bootstrap World

PB- BR where 7. ..

An Introduction to - the Bootstrap

An introduction to the bootstrap /Brad Efron Rob Tibshirani. p. cm. Includes bibliographical references. ISBN 0-412-0423 1-2. 1. Bootstrap (Statistics) I.

Introduction au bootstrap

La motivation du bootstrap 1 (Efron 1982 ; Efron et Tibshirani

Jackknife bootstrap et cross-validation

Jackknife bootstrap and other resamplings plans (1982). B. Efron [Efr82]. ? An introduction to the bootstrap (1993). B. Efron et R. Tibshirani[ET94].

Introduction to the Bootstrap

1 juin 2003 (1979). Bootstrap methods: another look at the jackknife. Annals of Statistics 7. 1-26. Efron

A Practical intorduction to the Bootstrap Using the SAS System

This will include an introduction to the techniques of bootstrapping – including involved should read Efron and Tibshirani(1993)2. WHAT IS BOOTSTRAPPING ...

INTRODUCTION TO THE BOOTSTRAP

Suppose we have observedX1,....,Xn(not necessarily real numbers, they can be in any spaceS) and suppose for simplicity that then observations are all different. We can form the empirical measure P n=1 n? n j=1δXjwhereδx(A) := 1A(x) := 1 ifx?Aand 0 otherwise, for any subsetA?S. LetPbe the unknown probability distribution, from which we assumeXjare i.i.d. We would like to estimate some functionalT(P). Here "functional" just means a function whose do- main is a relatively abstract space, in this case a space of probability measuresP, includingPnandPBndefined below. ForPdefined on the real line, an example of a functionalT(P) is the median ofP, defined as the midpoint of the interval of medians if the interval does not re- duce to a point. ForPdefined on any space, another example of a real-valued functional isT(P) =EPf:=?f dPwherefis a bounded function which in measure-theoretic terms is measurable, or in proba- bility terms is a random variable with respect toP, so that since it is bounded, its expectation is well-defined, as is its variance. We can give a point estimateT(Pn) ofT(P) but we"d like to know how uncertain the estimate is, for example to give a confidence interval forT(P) ifTis real-valued, without knowing anything more aboutP than the observationsX1,...,Xnsummarized inPn. For example, we don"t assume thatPbelongs to any particular parametric family. What the bootstrap does is to resample from the given sample,i.e. to takeXB1,...,XBKi.i.d. (Pn). Here we"re sampling "with replacement" from the original sample. In general one could consider different values of thebootstrap sample sizeK, but the default choice, to be used here, will beK=n. Thus fromXB1,...,XBnwe can form thebootstrap empirical measure P Bn:=1 nn k=1δ XBk. We will be interested not in the (unconditional) distribution ofPBnbut in its conditional distribution givenPn. To estimate this distribution one can use a Monte Carlo method: one repeats the resampling some large numberRof times (Rmay be called the number of bootstrap replications), givingRi.i.d. values ofPBnall for the samePn, from which one can estimate the conditional distribution ofT(PBn) givenPn. This

Date: 18.465, revised March 28, 2015.

INTRODUCTION TO THE BOOTSTRAP 2

requires altogetherRni.i.d. samples from a givenPnand, to find the sampling distribution of a functionalT(PBn),Rseparate evaluations of this functional. The intent of the method is, to get a confidence set (such as a confidence interval ifTis real-valued) for the unknown T(P), we"d like to know the distribution ofT(P?n)-T(P), whereP?nis a random empirical measure fromP(and so to be distinguished from the observedPnfrom whichPBnis sampled) and we estimate this from the conditional distribution ofT(PBn)-T(Pn) givenPn, which we can observe. When is equality in distribution preserved?When random variables XandYhave the same distribution or in other words are equal in distribution, we will writeX=dY(Xis equal in distribution toY). It follows that ifcis any constant, thenX+c=dY+candcX=dcY. But for example letXandYbe i.i.d.N(0,1). ThenX=dYand

X=dX, butX+X?=dX+YbecauseX+X= 2XisN(0,4) while

X+YisN(0,2).

Need for computation. For reasonably largeR(andn), the bootstrap is a computer-intensive method. The availability of computers made possible the invention of the bootstrap by Efron (1979), seealso the exposition by Efron and Tibshirani (1993). For example, thepaper by Suzuki and Shimodaira (2006), 3d page, mentions a bootstrapcalcula- tion taking over 7 hours on one processor, or 24 minutes on 20 parallel processors. High-level probability theory of the bootstrap. Let"s see what can be said about the bootstrap from a theoretical viewpoint. The bootstrap has good properties for suitable sample means or collections of them. Letgbe a function onS. Then?gdPnis just the observed sample mean ofg(Xj). We know that ifgis a random variable with finite variance, so that?g(x)2dP(x)<∞, then by the central limit theo- rem,⎷ n?gd(Pn-P) converges in distribution asn→ ∞to a normal variable with mean 0 and the same variance asg. Moreover ifg1,...,gk are random variables each with finite variance, then the random vec- tor⎷ n{?gid(Pn-P)}ki=1converges in distribution to a vector, say {GP(gi)}ki=1, withk-variate normal distribution having mean 0 and the same covariance matrix as that of thegiforP. SincePis unknown, it"s useful to take the functionsgito be bounded, so we can be sure that finite means, variances and covariances will exist. The convergence to normality holds uniformly over some infinite families of functions, for example on the real line, over the set of all indicator functions 1(-∞,x] for allx, as shown in the KMT (Koml´os-Major-Tusn´ady) theorem,

INTRODUCTION TO THE BOOTSTRAP 3

made more precise by Bretagnolle and Massart. General conditions on a family of functions for such uniform central limit theorems to hold are given for example in van der Vaart and Wellner (1996) and Dudley (2014). Moreover, Gin´e and Zinn (1990) proved under general condi- tions that if the uniform central limit theorem holds over a familyF of functions, then it holds also for the bootstrapped empirical process⎷ n(PBn-Pn) conditional onPn, in probability asn→ ∞. Expositions are given in van der Vaart and Wellner (1996,§3.6) and Dudley (2014,

§§9.2-9.4).

We saw however that even in the most classical case of empirical distribution functions, the Bretagnolle-Massart theoremdidn"t give a fast enough rate of convergence to be of direct practical use, and that quantiles for the supremum norm of classical empirical processes (Kol- mogorov statistics) seemed to converge to their limits at a 1/⎷ nrate rather than the (logn)/⎷ nrate given by the KMT theorem. In general, still less is known about the speed of convergence of empirical processes to their limiting Gaussian processes. In some cases the convergence is known to be slow. For example, in Euclidean spaceRd, letBdbe the x?Rdand allr >0. LetPbe the uniform distribution on the unit n(Pn-P) converges in distribution with respect to uniform convergence overBd toGP, but but Beck (1985) showed that ford≥2 convergence is no faster than at the rateO(n-1/(2d)). Rather the Gin´e-Zinn theorem gives us some overall reassurance that the bootstrap works asymptotically rather generally. A functional with general, in fact possibly infinite-dimensional, val- ues, is as follows: letFbe a class of bounded measurable functions, and letT(P) :={?f dP:f? F}. Such functionals arise in the Gin´e- Zinn theorem mentioned previously. In this course we"ll be concerned at least for the time being with real-valued functionals. Definition. For a real-valued functionalTand forX1,...,Xni.i.d. (P) with empirical measurePn, we"ll say that the bootstrapis validfor TandPif there exists somet >0 such that asn→ ∞: (a) The distribution ofnt(T(P?n)-T(P)) converges to that of a fi- nite valued, non-degenerate random variableY, where non-degenerate means thatP(Y= 0)<1; (b) The conditional distribution ofnt(T(PBn)-T(Pn)) givenPncon- verges to that of the sameYasn→ ∞, in probability with respect to P n, where the last phrase means that asn→ ∞, the probability that

INTRODUCTION TO THE BOOTSTRAP 4

P nis such that the given conditional distribution is close to that ofY approaches 1. Remarks. If part (a) of the definition holds for somet >0, then it holds only for thatt, because if 0< s < t < uthenns(T(Pn)-T(P))→0 in probability, andnu(T(Pn)-T(P)) is not bounded in probability, so it cannot converge in distribution. If (a) holds, then most often in practice,t= 1/2. For example let T(P) =?g dPfor some bounded functiongwhich is a random vari- able with respect toP, with varianceσ2>0 depending onP. Then part (a) of the definition holds witht= 1/2, as⎷ n(T(Pn)-T(P)) converges in distribution toN(0,σ2) by the central limit theorem. In this case the bootstrap is valid, as the conditional distribution of⎷ n(T(PBn)-T(Pn)) givenPndoes converge to the same limiting distri- bution, in probability with respect toPn(or so it seems, by the Linde- berg triangular arrays central limit theorem), but the bootstrap is not helpful or needed. One can estimateσ2bys2g=1 n-1? n j=1(g(Xj)-g)2 where g=?gdPnand and apply the central limit theorem directly. The bootstrap - basic properties. Suppose again for convenience that X

1,...,Xnare all distinct. The probability that a given observation,

sayXj, is omitted from the bootstrap sample, i.e.XBk?=Xjfor all k= 1,...,n, is?1-1 n? n, which converges to 1/easn→ ∞. Thus, on average, for largen, a fraction about 1/eof the original observations are omitted from the bootstrap sample. Further, asn→ ∞, ifnj is the number of timesXjis selected in one bootstrap sample of size n, then (n1,...,nn) have a multinomial (n;1/n,....,1/n) distribution, so the marginal distribution of eachnjis binomial(n,1/n) which converges asn→ ∞to a Poisson(1) distribution, i.e. Pr(nj=k)→1/(ek!) as n→ ∞for eachk= 0,1,... . Since theXjare all different, each choice ofn1,...,nngives a differ- ent value ofPBn. The number of possible choices of integersnj≥0 such that?nj=1nj=nis?2n-1 n?, as is known from basic combinatorics. [It can be seen as follows: consider the set of all strings of 2n-1 characters consisting ofn1"s andn-1 0"s. There are clearly?2n-1 n?such strings. There is a one-to-one correspondence between such strings and choices ofnjas follows. Letn1be the number of 1"s before the first 0, letnjbe the number of 1"s between the (j-1)st andjth 0"s forj= 2,...,n-1, and letnnbe the number of 1"s after the last 0.] Asn→ ∞, it can be seen via Stirling"s formula that?2n-1 n?is asymp- totic to 4 nnbCfor someband some constantC, where the dominant factor 4 ngrows geometrically withn. So, unlessnis rather small, it"s

INTRODUCTION TO THE BOOTSTRAP 5

not practicable to find the exact distribution ofT(PBn) givenPn, as one would have to computeTat roughly 4ndifferentPBn"s. One would also need to compute, for each possiblen1,...,nn, the multinomial probability?n n

1,...,nn?n-n. So there is a need for bootstrap sampling.

In such sampling we doRbootstrap replications for some large enoughR. Specifically, letXBkibe i.i.d. (Pn) fork= 1,...,nand i= 1,...,R. Let P

Bni:=1

nn k=1δ XBki fori= 1,...,R. Thus we haveRindependent copies ofPBn. We can form theRi.i.d. random variablesTi:=T(PBni). The bootstrap for real-valuedXj, order statistics, and quantiles. ForXj real, The bootstrap sample has its own order statisticsXB(k),k= 1,...,n, which for givenPnhave a discrete distribution. As will be seen in PS6, the probability distributions of these order statistics can be evaluated in terms of binomial distributions. So, it"s unnecessary actually to do bootstrap sampling in these cases as we have the exact distribution. As to be found in PS6 one can get approximate bootstrap 100(1- α)% confidence intervals for quantiles. (They can only be approximate because of the discrete distribution of the bootstrap orderstatistics.) But as will also be seen in PS6, one can directly get nonparametric confidence intervals for quantiles without the bootstrap. Then one can compare the bootstrap and non-bootstrap confidence intervals to see if they agree exactly or approximately. If they do, they can give further reassurance of the validity of the bootstrap, even if it is not really needed in this case. For extreme order statistics, however, the bootstrap may not be valid. Example. Consider the functionalT(P) = sup{x:P((-∞,x]) = 0}. ThenT(Pn) =X(1)andT(PBn) =XB(1). We will haveXB(1)=X(j1) for somej1. Suppose thatX1,...,Xnare all distinct. ThenX(1)< X (2)<···< X(n). By definition of bootstrap sampling, Pr(j1≥j) =?n-(j-1) n? nforj= 1,...,n, which converges asn→ ∞toe1-j. It follows that lim n→∞Pr(j1=j) =e1-j-e-j=qj-1pwhereq= 1/e andp= 1-q. Thus the distribution ofj1converges to a geometric(p) distribution, and we have the asymptotic distribution ofXB(1)in terms of theX(j). For simplicity, letPbeU[0,1], so thatT(P) = 0. The following is known: for the order statisticsX(1)< X(2)<···< X(n)fromU[0,1],

INTRODUCTION TO THE BOOTSTRAP 6

defineX(0)= 0,X(n+1)= 1, andsj=X(j)-X(j-1)forj= 1,...,n+1. Then for eachn≥1, the joint distribution of the spacings{sj}n+1 j=1 equals that of{Yj/Sn+1}n+1 j=1whereY1,...,Yn+1are i.i.d. standard ex- ponential random variables andSn+1=?n+1 i=1Yi. A reference for this is Shorack and Wellner (2009,§8.2, Proposition 1 p. 335). SinceEYj= 1 for eachj, by the law of large numbers,Sn+1/(n+

1)→1 asn→ ∞, and soSn+1≂n+ 1≂n. It follows that

n(T(Pn)-T(P)) =n(X(1)-0) =ns1converges in distribution to standard exponential, so part (a) in the definition of bootstrap validity holds witht= 1. However,n(T(PBn)-T(Pn)) =n(XB(1)-X(1)) equals 0 with prob- ability converging top >0, so it does not have the same limiting distribution as in part (a), and the bootstrap is not valid for thisT andP. There would be a similar failure for the sameTand anyP with a densityfsuch thatf(x) approaches a positive limit asx↓a for someaandf(x) = 0 forx < a, such asU[a,b] or the distribution ofa+XwhereXhas an exponential (λ) density. Hereamight be unknown and we might want to estimate it. The bootstrap and standard errors. Recall that ifX1,...,Xnare i.i.d. with finite meanμ, varianceσ2and standard deviationσ, then for X= (X1+···+Xn)/nwe haveEX=μand Var(X) =σ2/n, so the standard deviation of

Xisσ/⎷n, which is called thestan-

dard error of the mean. It can be estimated by?SE=sX/⎷ nwhere s 2 X=1 n-1? n j=1(Xj-X)2. Fornlarge enough, by the central limit theorem,⎷ n(X-μ) is approximatelyN(0,σ2), so we can get approxi- mate 100(1-α)% confidence intervals forμwith endpoints

X±?SEzα/2

whereP(Z≥zβ) =βfor aN(0,1) variableZ. Applying the same idea to the bootstrap (Efron and Tibshirani, Chapter 6), where nowTis general or complicated enough that we cannot treat it as directly as we can for means or quantiles, but we can calculateT(Pn), suppose we observe a givenPnand takeRi.i.d. bootstrap samples givingPBni,i= 1,...,R, recallTi=T(PBni), take the sample mean T=1RR i=1T i and sample variance (sBT)2=1 R-1? R i=1(Ti-T)2. One can get an ap- proximate confidence interval forT(P) for the unknownPas described below.

INTRODUCTION TO THE BOOTSTRAP 7

For the R bootstrap library "boot," one begins with a sample y such as (X1,...,Xn). For example, suppose the functionalTto be bootstrapped is the median. One can create a bootstrap object which may be called y.boot by >library(boot) >set.seed(101) >y.boot = boot(y,function(x,i) median(x[i]), R = 1000) Hereiindexes bootstrap replications and will run from 1 toR. If one then types "y.boot" one gets output with labels in the first line: "original bias std.error" and numbers under each.This output relates mainly to the normal-based confidence intervals, which Venables and Ripley, and I, de-emphasize. The first number "original" is justT(Pn), in this case the sample median of the sampley. The second number, "bias," equals

T-T(Pn), recalling thatTis the sample mean of the

T i. "Standard error" means the standard deviation of a statistic, or an estimate of it. Sometimes, and especially when called "standard error of the mean," it means standard deviation of a sample mean, oran estimate of it, namely, for i.i.d. random variablesTiwith standard deviationσ, the standard deviation of

Tisσ/⎷R, estimated bysBT/⎷R.

In this situation, the relevant statistic is an individualTi=T(PBni). Assuming thatT(P?n)-T(P) is approximately normally distributed with meanμ(not necessarily 0 in general) and standard deviationσ, one would estimateσby the sample standard deviationsBTof theTi. In a simple "toy" exampley= (0,1,3), the mean of the bootstrap sample medianmBis 1.2593, and its median is 1, so that the true bias of Tis 0.2593. Applying "boot" withR= 1000, the estimated bias R gave was 0.246. The true standard deviation ofmBis 1.1086 andRgave the estimated "standard error" of 1.1067. Dividing by⎷

Rwould give

something much smaller. Of course, one would likenmuch larger than

3 so that the bootstrap would become valid and approximate normality

might hold. The example was chosen just to check the meaning of the outputs of some of R"s bootstrap functions. The normal-based confidence intervals from the bootstrap work as follows. Assume as in general with the bootstrap that the conditional distribution ofT(PBn)-T(Pn) givenPnis approximately the same as the distribution ofT(P?n)-T(P) and now moreover, that this distri- bution is approximatelyN(μ,σ2) for someμand someσ >0. One then estimatesμby?μ=

T-T(Pn) which is the "bias," andσby the

sample standard deviation of theTiwhich is the "standard error." If T(Pn)-T(P) as a random variable has approximately this distribution,

INTRODUCTION TO THE BOOTSTRAP 8

then a point estimate ofT(P) isT(Pn)-?μ. (This may be somewhat surprising since one might have thoughtT(Pn) itself was the natural point estimator ofT(P).) Besides what is displayed at first, a bootstrap object such asy.boot above actually has much more information in it, including the order gives a choice of confidence intervals in which those of this form are called "normal" in the output, abbreviated "norm" in the command as in >boot.ci(y.boot,conf=c(0.90,0.95),type = c("norm", "basic", "perc")) where "basic" and "perc," as we"ll see below, use the order statistics T (i). In PS6 you can see how they behave in some cases. Using the bootstrap order statistics. An idea seemingly better than the standard error approach in bootstrapping is to use not only the sample mean and variance of the bootstrap observationsTibut all the From these one can estimate quantiles of the distribution oftheTifor the givenPn. For 0< q <1 theqth sample quantile of theTiis defined asT(?Rq?)ifRqis not an integer, where?x?is defined as the least integer ≥x, or as1

2[T(Rq)+T(Rq+1)] ifRqis an integer, as in the familiar case

of the sample median whereq= 1/2. For the similar case of Monte Carlo sampling, where we have someNinstead ofR, recall that in finding quantiles for the dip statistics, the Hartigans usedN= 9999 and Maechler usedN= 106+ 1 so thatNqis not an integer for anyquotesdbs_dbs17.pdfusesText_23

[PDF] Introduction to the Bootstrap An introduction to the bootstrap/