The Conjugate Prior for the Normal Distribution

Lisa Yan, CS109, 2020 Carl Friedrich Gauss Carl Friedrich Gauss (1777-1855) was a remarkably influential German mathematician Did not invent Normal distribution but rather popularized it

The Normal Distribution - Stanford University

Normal random variable An normal (= Gaussian) random variable is a good approximation to many other distributions It often results from sums or averages of independent random variables X∼N(μ,σ2) fX(x)= 1 σ√2π e − 1 2(x−μ σ) 2

Normal distribution - University of Notre Dame

Normal distribution The normal distribution is the most widely known and used of all distributions Because the normal distribution approximates many natural phenomena so well, it has developed into a standard of reference for many probability problems I Characteristics of the Normal distribution • Symmetric, bell shaped

Tables of Normal Values (As of January 2013)

Tables of Normal Values (As of January 2013) Note: Values and units of measurement listed in these tables are derived from several resources Substantial variation exists in the ranges quoted as “normal” and may vary depending on the assay used by different laboratories Therefore, these tables should be considered as directional only

Standard Normal Distribution Table - SOA

STANDARD NORMAL DISTRIBUTION TABLE Entries represent Pr(Z ≤ z) The value of z to the first decimal is given in the left column The second decimal is given in the top row

Normal Lab Values Chart - IM 2015

Normal Male — 130 mL/min/1 73 m2 Female — 120 mL/min/1 73 m2 Stages of Chronic Kidney Disease Stage 1 — greater than or equal to 90 mL/min/1 73 m2

The Conjugate Prior for the Normal Distribution

2 The Conjugate Prior for the Normal Distribution Remark 3 These formulas are extremely useful so you should memorize them They are easily derived based on the notion of a Schur complement of a matrix We apply this lemma with the correspondence: xz 2, z 1 x= + ˙" "˘N(0;1) = 0 + ˙ 0 ˘N(0;1) E(x) = 0 (5)

Normal Gait - MU School of Medicine

Normal Gait Heikki Uustal, MD Medical Director, Prosthetic/Orthotic Team JFK-Johnson Rehab Institute Edison, NJ 1

[PDF] indice poids taille age

[PDF] indice de masse corporelle

[PDF] calculer son imc

[PDF] cdt alcool taux normal

[PDF] taux cdt

[PDF] calcul taux alcoolémie formule

[PDF] taux d'alcoolémie mortel

[PDF] taux d'alcool permis quebec

[PDF] 0.08 alcool age

[PDF] multiplication et division de fraction

[PDF] évaluation 5ème maths

[PDF] mathématiques 5ème

[PDF] cours maths 5ème

[PDF] la sphère terrestre correction

[PDF] la sphere terrestre dm

Stat260: Bayesian Modeling and Inference Lecture Date: February8th, 2010

The Conjugate Prior for the Normal Distribution

Lecturer: Michael I. Jordan Scribe: Teodor Mihai MoldovanWe will look at the Gaussian distribution from a Bayesian point of view. In the standard form, the likelihood

has two parameters, the meanand the variance2:

P(x1;x2;;xnj;2)/1

nexp

122X(xi)2

(1)

Our aim is to nd conjugate prior distributions for these parameters. We will investigate the hyper-parameter

(prior parameter) update relations and the problem of predicting new data from old data:P(xnewjxold).

1 Fixed variance(2), random mean()

Keeping2xed, the conjugate prior foris a Gaussian. P(j 0; 20)/1 0exp

1220(0)2

(2)typically 0typically large Remark1.In practice, when little is known about, it is common to set the location hyper-parameter to zero and the scale to some large value.

1.1 Posterior for single measurement(n= 1)

We want to put together the prior (2) and the likelihood (1) to get the posterior (jx). For now, assume

we have only one measurement (n= 1);

There are several ways to do this:

We could multiply the two distributions directly and complete the square in the exponent. Note thatandxhave a joint Gaussian distribution. Then the conditionaljxis also a Gaussian for whose parameters we know formulas: Lemma 2.Assume(z1;z2)is distributed according to a bivariate Gaussian. Thenz1jz2is Gaussian dis- tributed with parameters:

E(z1jz2) =E(z1) +Cov(z1;z2)Var(z2)(z2E(z2)) (3)

Var(z1jz2) =Var(z1)Cov2(z1;z2)Var(z2)(4)

2The Conjugate Prior for the Normal Distribution

Remark3.These formulas are extremely useful so you should memorize them. They are easily derived based

on the notion of a Schur complement of a matrix. We apply this lemma with the correspondence:x!z2,!z1 x=+" " N(0;1) =0+0 N(0;1)

E(x) =0(5)

Var(x) =E(Var(xj)) + Var(E(xj)) =2+20(6)

Cov(x;) =E(x0)(0) =20(7)

Using equations 3 and 4:

E(jx) =0+20

2+20(x0) =20

2+20x+

2 2+20 0(8)

MLEprior mean

Var(jx) =220

2+20=11

20+1

2= (prior+data)1(9)

Denition 4.1 /2is usually called theprecisionand is denoted by The posterior mean is usually a convex combination of the prior mean and the MLE. The posterior precision is, in this case, the sum of the prior precision and the data precision post=prior+data

We summarize our results so far:

Lemma 5.Assumexj N(;2)and N(0;20). Then:

jx N 20

2+20x+

2 2+20 0; 1 20+1 2 1!

1.2 Posterior for multiple measurements(n1)

Now look at the posterior update for multiple measurements. We could adapt our previous derivation, but

that would be tedious since we would have to use the multivariate version of Lemma 2. Instead we will

reduce the problem to the univariate case, with the sample mean x= (Pxi)=nas the new variable. x ij N(;2) i.i.d.)xj N ;2n (10)

P(x1;x2;;xnj)/1

exp

122X(xi)2

exp 122

Xx2i2Xx

i+n2 exp n222x+2 exp n22(x)2

P(xj) (11)

The Conjugate Prior for the Normal Distribution3

Then for the posterior probability, we get

P(jx1;x2;;xn)/P(x1;x2;;xnj)P()/P(xj)P()

/P(jx) (12) We can now plug xinto our previous result and we get:

Lemma 6.Assumexij N(;2)i.i.d.and N(0;20). Then:

jx1;x2;;xn N 20 2n +20x+2 2n +20 0;1 20+n 2 1!

2 Random variance(2), xed mean()

2.1 Posterior

Assumingis xed, then the conjugate prior for2is an inverse Gamma distribution: zj;IG(;)P(zj;) =()z1exp z (13)

For the posterior we get another inverse Gamma:

P(2j;)/(2)(+n2

)1exp +12 P(xi) 2 /(2)post1exp post 2 (14)

Lemma 7.Ifxij;2 N(;2)i.i.d.and2IG(;). Then:

2jx1;x2;;xnIG

+n2 ;+12 X(xi) If we re-parametrize in terms of precisions, the conjugate prior is a Gamma distribution. j;Ga(;)P(j;) =()1exp() (15)

And the posterior is:

P(j;)/(+n2

)1exp +12 X(xi) (16)

Lemma 8.Ifxij; N(;)i.i.d.andGa(;). Then:

jx1;x2;;xnGa +n2 ;+12 X(xi) Remark9.Should we prefer working with variances or precisions? We should prefer both:

Variances add when we marginalize

Precisions add when we condition

4The Conjugate Prior for the Normal Distribution

2.2 Prediction

We might want to compute the probability of getting some new data given old data. This can be done by

marginalizing out parameters:

P(xnewjx;;;) =Z

P(xnewjx;;;;)P(jx;;)d

P(xnewj;)P(jx;;)d

P(xnewj;)P(jpost;post)d(17)

This integral \smears" the Gaussian into a heavier tailed distribution, which will turn out to be a student's

t-distribution: j;Ga(;) xj; N(;)

P(xj;;) =Z()1e2

12 exp 2 (x)2 d ()1p2Z (+12 )1e(+(x)2)=2d Gamma integral; use memorized normalizing constant ()1p2+12 +12 (x)2+12 +12 ()1(2)12 1 1 +

12(x)2+12

(18)Remark10.The student-t density has three parameters:;;and is symmetric around. Whenis an integer or a half-integer we get simplications using the formulas (k+ 1) =k(k) and (1=2) =p The following is another useful parametrization for the student's t-distribution: p= 2 =

P(xj;p;) =p+12

p2 p 12 1 1 + p (x)2 p+12 (19) with two interesting special cases:

Ifp= 1 we get a Cauchy distribution

Ifp! 1we get a Gaussian distribution

Remark11.We might want to sample from a student's t-distribution. We would sampleGa(;), then samplexi N(;), collectxiand repeat.

The Conjugate Prior for the Normal Distribution5

3 Both variance(2)and mean()are random

Now, we want to put a prior onand2together. We could simply multiply the prior densities we obtained in the previous two sections, implicitly assumingand2are independent. Unfortunately, if we did that,

we would not get a conjugate prior. One way to see this is that if we believe that our data is generated

according to the graphical model in Figure 1, we nd that, conditioned onx, the two parametersand2 are, in fact, dependent and this should be expressed by a conjugate prior.x 2 0

20Figure 1:and2are dependent conditioned onx

We will use the following prior distribution which, as we will show, is conjugate to the Gaussian likelihood:

x ij; N(;) i.i.d. j N(0;n0) Ga(;)

3.1 Posterior

First look atjx;. This is the simpler part, as we can use Lemma 8: jx; Nnn+n0x+n0n+n00; n+n0 (20) Next, look atjx. We get this by expressing the joint densityP(;jx) and marginalizing out:

P(;jx)/P()P(j)P(xj;) (21)

/1e1=2exp n02 (0)2 n=2exp 2

X(xi)2

trick:xix+ x /+n2 1exp +12

X(xix)2

1=2exp

2 (n0(0)2+n(x)2) (22) As we integrate outwe get the normalization constant: 12 expnn02(n+n0)(x0)2

[PDF] The Conjugate Prior for the Normal Distribution

The Conjugate Prior for the Normal Distribution

P(x1;x2;;xnj;2)/1

122X(xi)2

1 Fixed variance(2), random mean()

1220(0)2

1.1 Posterior for single measurement(n= 1)

There are several ways to do this:

E(z1jz2) =E(z1) +Cov(z1;z2)Var(z2)(z2E(z2)) (3)

Var(z1jz2) =Var(z1)Cov2(z1;z2)Var(z2)(4)

2The Conjugate Prior for the Normal Distribution

E(x) =0(5)

Var(x) =E(Var(xj)) + Var(E(xj)) =2+20(6)

Cov(x;) =E(x0)(0) =20(7)

Using equations 3 and 4:

E(jx) =0+20

2+20(x0) =20

2+20x+

MLEprior mean

Var(jx) =220

2+20=11

2= (prior+data)1(9)

We summarize our results so far:

Lemma 5.Assumexj N(;2)and N(0;20). Then:

2+20x+

1.2 Posterior for multiple measurements(n1)

P(x1;x2;;xnj)/1

122X(xi)2

Xx2i2Xx

P(xj) (11)

The Conjugate Prior for the Normal Distribution3

Then for the posterior probability, we get

P(jx1;x2;;xn)/P(x1;x2;;xnj)P()/P(xj)P()

Lemma 6.Assumexij N(;2)i.i.d.and N(0;20). Then:

2 Random variance(2), xed mean()

2.1 Posterior

For the posterior we get another inverse Gamma:

P(2j;)/(2)(+n2

Lemma 7.Ifxij;2 N(;2)i.i.d.and2IG(;). Then:

2jx1;x2;;xnIG

And the posterior is:

P(j;)/(+n2

Lemma 8.Ifxij; N(;)i.i.d.andGa(;). Then:

Variances add when we marginalize

Precisions add when we condition

4The Conjugate Prior for the Normal Distribution

2.2 Prediction

P(xnewjx;;;) =Z

P(xnewjx;;;;)P(jx;;)d

P(xnewj;)P(jx;;)d

P(xnewj;)P(jpost;post)d(17)

P(xj;;) =Z()1e2

12(x)2+12

P(xj;p;) =p+12

Ifp= 1 we get a Cauchy distribution

Ifp! 1we get a Gaussian distribution

The Conjugate Prior for the Normal Distribution5

3 Both variance(2)and mean()are random

20Figure 1:and2are dependent conditioned onx

3.1 Posterior

P(;jx)/P()P(j)P(xj;) (21)

X(xi)2

X(xix)2

1=2exp

Which leads to a Gamma posterior for:

P(jx)/+n2

X(xix)2+nn02(n+n0)(x0)2

To summarize:

6The Conjugate Prior for the Normal Distribution

Lemma 12.If we assume:

Then the posterior is:

X(xix)2+nn02(n+n0)(x0)2

3.2 Prediction

P(xnewjx) =Z Z

GammajxGaussianj;xGaussianx

P(xnewjx) =Z

GammajxZ

Gaussianj;xGaussianx

P(xnewjx) =Z

GammajxGaussianx