Lecture 10: March 1

Lecturer: Alessandro Rinaldo Scribes: Wanshan Li

Disclaimer:These notes have not been subjected to the usual scrutiny reserved for formal publications.

They may be distributed outside this class only with the permission of the Instructor.

10.1 Conditional Expectation

Given a probability space (

;F;P), letC Fbe a sub--eld ofF, and a xed setA2 F. Our goal is to dene the conditional probabilityP(AjC). The point is that,Cprovides us additional information, so

P(AjC) would be dierent fromP(A).

First we consider of a special caseC=(B1;;Bn) wherefB1;;Bngis a partition of . The additional information here is that, for any!2 , one knows whether!2Bkor not.


!Rby f(!) =(


c k;if!2BkandP(Bk) = 0;(10.1) whereck2Rcan be any constant. Now we dene the conditional probability, as a real-valued function on , by Pr(AjC)(!) =f(!). The following fact shows that our denition is reasonable in a way:

P(A\Bk) =P(AjBk)P(Bk) =Z

B kPr(AjC)(!)dP(!): Now letCbe a generic sub--eld ofF. We can create a measureon ( ;C), given by (B) =P(A\B): If we can nd someC-measurable functionf, such that (Notice thatPis originally dened on ( ;F), but here we can treat it as a probability measure on ( ;C) asC F) (B) =P(A\B) =Z B f(!)dP(!); then we dene functionfto be theconditional probability ofAgivenC, and denote it asf= Pr(AjC). Thus by our denition 1)

Pr( AjC)() isC-measurable.

2)8B2 C,(B) =P(A\B) =R


RemarkThere exists manyversionsof Pr(AjC)(), but by property 2), these versions are equal to each othera:s:[P]. 10-1

10-2Lecture 10: March 1

LetX(!) =1A(!), then we may want to write

Pr(AjC) =E(XjC):

By generalizingXfrom an indicator function to any random variable we can get the denition of the conditional expectation.

Denition 10.1.Given a probability space(

;F;P), letC Fbe a sub--eld ofF, andXanF=B- measurable random variable withEjXj<1. The conditional expectation ofXgivenCis any real valued functionh: !R, such that


2) R

Bh(!)dP(!) =R

BX(!)dP(!),8B2 C.

his denoted asE[XjC].


f=E[XjC] meansfis a version ofE[XjC]. By 2) in the denition, ifh1andh2are two versions ofE[XjC], thenh1(!) =h2(!),a:s:[P]. Conversely, ifh1is a version ofE[XjC] andh1(!) =h2(!),a:s:[P], thenh2is also a version ofE[XjC].


g, thenE[XjC] =E[X].

IfXitself isC=Bmeasurable, thenX=E[XjC].

IfX=a; a:s:, thenE[XjC] =a; a:s:

10.1.1 Two perspectives

RN derivativeOne may ask "Does this function exist?". The answer is "Yes", and one can demonstrate this by usingRN derivative. AssumeX0; a:s:, the sketched proof is: dene(B) =R

BX(!)dP(!), then

is a measure on ( ;C). By the RN theorem,9h, which isC-measurable and8B2 C, (B) =Z B h(!)dP(!): Then by denition, the RN derivativehis the conditional expectationE[XjC]. ProjectionAn alternative perspective is to think ofE[XjC] as a "projection". Given a r.v.Xon ( ;F;P) s.t.EX2<1andC F. ConsiderL2( ;C;P), a Hilbert space of r.v.'s that areC-measurable andL2. Then one can show that, theC-measurable random variableZis the conditional expectation ofXif and only ifZis the orthogonal projection ofXontoL2( ;C;P), that is

E[W(XZ)] = 0;8W2L2(

;C;P); or equivalently,

Z= argmin

W2L2( ;C;P)E(XW)2:

Lecture 10: March 110-3

Now from this perspective, if we letC=(Y) whereYis an r.v. on ( ;F;P), then by theorem 39 in notes,

E[XjY],E[XjC] = argmin

meas. functiong; s:t:E[g(Y)]2<1E(Xg(Y))2: Recall that the usual machinery of deningE[XjY] is

E[XjY] =g(Y);whereg(y) =Z

R xf


R xfX;Y(x;y)f


Example 10.2.LetX1;X2i:i:d:Unifrom(0;1), andY= maxfX1;X2g,X=X1. Then one version of

E[XjY]ish(Y) =34

Y. In addition, another version can be given by


1(Y) =(


Y;ifYis irrational;


10.1.2 Properties

Some basic properties of conditional expectation coincide with expectation, including 1) Linearit y.If E[X],E[Y], andE[X+Y] all exist, thenE[XjC] +E[YjC] is a version ofE[X+YjC]. 2)

Monotonicit y.If X1X2a:s:, thenE[X1jC]E[X2jC]a:s:

3) Jensen's in equality.Let E(X) be nite. Ifis a convex function and(X)2L1, thenE[(X)jMC] (E[XjC])a:s: 4) Con vergencetheorems: monotone con vergencetheorem, dominan tcon vergencetheorem. Theorem 10.3(Convergence theorem).LetCbe a sub--eld ofF. 1) (Monotone) If 0XnXa.s. for allnandXn!Xa.s., thenE[XnjC]!E[XjC]. 2) (Dominant) If Xn!Xa.s. andjXnj Ya.s., whereY2L1, thenE[XnjC]!E[XjC]. Proposition 10.4(Tower property of conditional expectation).If sub--eldsC1 C2 F, andEjXj<1, thenE[XjC1]is a version ofE[E[XjC2]jC1]. In particular,E[X] =E[E[XjC]](takingC=f?; g).

10.2 Regular Conditional Probability

Notice that Pr(jC)() is a function dened onF

By denition, forA2 F, Pr(AjC)() is a version ofE[1AjC]().

We would like8!2

, Pr(jC)(!) to be a probability measure on ( ;F). It is easy to see that Pr(AjC)()2[0;1]a:s:[P] as a function of!on ( ;C;P). We can also prove that it is countably additivea:e:[P]:

10-4Lecture 10: March 1

Proposition 10.5.IffAng1n=1is a sequence of disjointF-measurable sets, then

W(!) =1X

n=1Pr(AnjC)(!) is a version ofPr(S1 n=1AnjC).

This proposition means thatgivena sequence of disjointF-measurable setsfAng1n=1, for [P]a:e: !, we have

1 X n=1Pr(AnjC)(!) = Pr(1[ n=1A njC)(!): In general, however, for the collection of functionsfPr(AjC)()g:A2 Fgand agiven!2 , Pr(jC)(!) is

not necessarily countably additive, and therefore is not a probability measure. Even in the sense ofa:s:[P]

(with respect to!2 ), Pr(jC)(!) is not necessarily a probability measure. The intuition is, for a givenfAng1n=1of disjointF-measurable sets, to makeP1 n=1Pr(AnjC)(!) = Pr(S1 n=1AnjC)(!) (in the sense ofa:s:[P]), we can only allow Pr(S1 n=1AnjC)(!)6=P1 n=1Pr(AnjC)(!) for!in aP-measure-0 setN(fAng1n=1) Therefore, to make Pr(jC)(!) a probability measure (also in the sense ofa:s:[P]), we want

P(N(fAng1n=1)) = 0

to holda:s:[P] (w.r.t.!) over all possible choices of sequencefAng1n=1. To ensure this, We need P([ fAngN(fAng1n=1)) = 0:

However, since there are uncountably many sequencesfAng1n=1, this may not necessarily hold. When this

nontrivial property holds, we call Pr(jC)() :A ![0;1] a regular conditional probability. Denition 10.6(Regular conditional probability).Given a probability space( ;F;P). LetA Fbe a sub--eld. We say that the functionPr(jC)() :A ![0;1]is a regular conditional probability (rcd) if

1)8A2 A,Pr(AjC)()is a version ofE[1AjC].


F or[P]a:e: !2

,Pr(jC)(!)is a probability measure on( ;A).

10.2.1 Regular Conditional Distribution

LetA=(X) for some r.v.Xthat isF=Bmeasurable. For eachB2 B, let

XjC(B)(!) = Pr(X1(B)jC)(!):

Then functionXjC(jC)() :B

![0;1] is called a regular conditional distribution ofXgivenCwhen

1)8B2 B,XjC(BjC)() is a version ofE[1X2BjC].


F or[ P]a:e: !2

,XjC(jC)(!) is a probability measure on (R;B).quotesdbs_dbs13.pdfusesText_19
