The curve always lies above the x-axis but approaches the x-axis as x extends indefinitely in either direction (The curve never crosses the x-axis ) 4 The
1 mar 2006 · Mean : locates the center of the distribution Changing : shifts the curve along its X axis Two Normal curves with different means are shown
A normal distribution is a continuous probability distribution for a random variable x The graph of a normal distribution is called the normal curve A normal
When we draw a normal distribution for some variable, the values of the variable are represented on the horizontal axis called the X axis We will refer to
The normal distribution is a continuous, bell-shaped distribution of a variable probability that the value x will be observed
The normal curve approaches, but never touches the x-axis as it find the probability that x will fall in a given interval by
is always on or above the horizontal axis, and the curve and the x?axis) which the curve would balance if made of solid material The
Continuous random variable • Has an infinite number of possible values that can be The graph of a normal distribution is called the normal curve x
For example, where the normal distribution lies on the x–axis depends upon it's mean, This is because the normal distribution does a great job of
This graph is an example of a standard normal curve where ? = 0 and ? = 1 • This means that the value on the x-axis equals the number of standard deviations
40944_6pols4150_02_06.pdf [POLS 4150] Probability Distributions and the
Normal Distribution
L. Jason Anastasopoulos
Summarizing probability distributions
I At their core, probability distributions are just functions just like those you might remember from calculus: ief(x) =x2. I They are functions that are defined, however, by their parameters. I These parameters are typically themeanandvariance.
Summarizing probability distributions
Normal distribution=f(μ,σ) =N(μ,σ)
Normal distribution=f(¯x,s) =N(¯x,s)
- For example, where the normal distribution lies on the x-axis depends upon it"s mean, orexpectedvalue.
- How fat the normal distribution is depends on its standard deviation (or variance which is just $s^2$
Summarizing probability distributions
V Clinton,GA≂N(36,5)- When we describe avariablein terms of its distribution, we usually specify what kind of distribution it follows, the mean and the standard deviation of that distribution.
- $V_{Clinton,GA}$ variable is the county vote share for Hillary Clinton in the 2016 election for the 159 counties in GA.
Clinton vote share in GA counties
Clinton vote share in GA counties
Trump vote share in GA counties
Trump vote share in GA counties
Trump vote share in GA counties
I IfVTrump,GAis a variable containing Trump vote share for GA counties. I
AndVTrump,GA≂N(60,5)
I
What areaandbin the equation
P(a words? Trump vote share in GA counties
I IfVTrump,GAis a variable containing Trump vote share for GA counties. I AndVTrump,GA≂N(60,5)
I What areaandbin the equationP(aIa=60-2?5=50 ,b=60+2?5=70.
Trump vote share in GA counties
I IfVTrump,GAis a variable containing Trump vote share for GA counties. I
AndVTrump,GA≂N(60,5)
IWhat ispin the equationP(55
Trump vote share in GA counties I IfVTrump,GAis a variable containing Trump vote share for GA counties. I AndVTrump,GA≂N(60,5)
IWhat ispin the equationP(55The Normal Distribution I Themostimportant distribution for statistical inference. I This is because the normal distribution does a great job of describing lots of things that don"t have a normal distribution. IAs we showed above, it is defined by two parameters themean andstandard deviation. The standard normal distribution-4-2024
0.0 0.2 0.4I The standard normal distbution is a normal distribution that has a mean of 0 and a standard deviation of 1. The standard normal distribution-4-2024
0.0 0.2 0.4I It is often referred to asN(0,1)
The standard normal distribution-4-2024
0.0 0.2 0.4I Why do we care about the standard normal distribution? The standard normal distribution-4-2024
0.0 0.2 0.4I As it turns out, we can use the standard normal distribution to obtain theprobabilitythat a random variable will take on certain values. The standard normal distribution-4-2024
0.0 0.2 0.4 xI In the standard normal distibution, each of the values represent how many standard deviations from the mean an observation is. The standard normal distribution-4-2024
0.0 0.2 0.4 zI In thestandard normal distibution, each of the values represent how many standard deviations from the mean an observation is. The standard normal distribution-4-2024
0.0 0.2 0.4 z= x-m sI In thestandard normal distibution, each of the values represent how many standard deviations from the mean an observation is. Example: Trump vote share in GA counties
z=x-μσ =60-605 =0 I Recall that we were discussing Trump"s vote share in GA counties in the 2016 election and modeled the distribution as N(60,5).
I Whereμ=60 andσ=5.
Example: Trump vote share in GA counties
z=x-μσ =65-605 =1 I Recall that we were discussing Trump"s vote share in GA counties in the 2016 election and modeled the distribution as N(60,5).
I Whereμ=60 andσ=5.
Example: Trump vote share in GA counties
z=x-μσ =55-605 =-1 I Recall that we were discussing Trump"s vote share in GA counties in the 2016 election and modeled the distribution as N(60,5).
I Whereμ=60 andσ=5.
Example: Trump vote share in GA counties
z=x-μσ =70-605 =2 I Recall that we were discussing Trump"s vote share in GA counties in the 2016 election and modeled the distribution as N(60,5).
I Whereμ=60 andσ=5.
Example: Trump vote share in GA counties
z=x-μσ =40-605 =-2 I Recall that we were discussing Trump"s vote share in GA counties in the 2016 election and modeled the distribution as N(60,5).
I Whereμ=60 andσ=5.
Finding probability values with the standard normal distribution I Say we wanted to findP(VTrump,GA>62%)or
P(VTrump,GA<58%).
I Unfortunately we can"t use the 68-95-99%rule here. I But, as mentioned before, we can use the standard normal distribution to find the probability for any interval using the standard normal distribution. Finding probability values with the standard normal distribution: procedure I To do this we actually have to do two things:
1.Standardizethe value that we want to find the probability for
(i.e. find thez-score). 2. Use no rmalp robabilitydistribu tiontable to find p robability values. Example: Trump vote share in GA counties4050607080 0.00 0.04 0.08I FindP(VTrump,GA>62%)
Step 1: Calculate z-score forx=62
z=x-μσ =62-605 =0.4 IThis z-score implies that 62 is 0.4 standard deviations from the mean of 60. Step 1: Calculate z-score forx=62
z=x-μσ =62-605 =0.4 IThis z-score implies that 62 is 0.4 standard deviations from the mean of 60. Step 2: Find the probability value corresponding to this z-score There are two ways to do this:
1. Lo okat the table 592 of the textb ook.
2. Use the pnorm()function inR.
Finding the probability value using the table-4-2024 0.0 0.2 0.4I Table gives the probability of a value greater than the z-score that you calculate. I For example, hereP(Z>0.4) =0.3446
Finding the probability value using the table
Thus: P(VTrump,GA>62%) =?P(VTrump,GA-μσ
>62-605 ) =?P(Z>0.4) =0.3446 Finding the probability value using R
IThe functionpnorm()inRcan give you the same table values and more. I Let"s explore this a bit.
Finding the probability value using R
pnorm(0.4,mean = 0 ,sd = 1 ,lower.tail = FALSE ) ## [1] 0.3445783-4-2024 0.0 0.2 0.4 P(Z>0.4)
Finding the probability value using R
pnorm(0.4,mean = 0 ,sd = 1 ,lower.tail = TRUE ) ## [1] 0.6554217-4-2024 0.0 0.2 0.4 P(Z<0.4)
Going back to the Trump vote share in Georgia..
ISo what we"ve basically found here using the tables in the book and R, is that the probability of finding a Georgia county that had a Trump vote share of above 62%is roughly 34.5%. I In other words, about 34.5%of GA counties had a Trump vote share over 62%. Going back to the Trump vote share in Georgia..
P(VTrump,GA<62%) =1-P(VTrump,GA>62%) =1-0.345=0.655 I What if we wanted to know the probability of a GA county having a Trump vote share below 62%? Going back to the Trump vote share in Georgia..
I What if we wanted to know the probability of a GA county having a Trump vote share below 45%or above 95%? I But now we changed the distribution toN(μ=60,σ=10). Let"s start by figuring outP(VTrump,GA<45%)20406080100 0.00 0.01 0.02 0.03 0.04 P(V(Trump, GA)<45)
Let"s start by figuring outP(VTrump,GA<45%)
P(VTrump,GA<45%) =?P(VTrump,GA-μσ
>45-6010 ) =?P(Z<-1.5) =? Let"s first conceptualize this by converting the distribution to a standard normal-4-2024 0.0 0.1 0.2 0.3 0.4 P(Z<-1.5)
Method 1: Using the table to findP(Z<-1.5)-4-2024
0.0 0.2 0.4 P(Z>1.5)I
Table only hasP(Z>1.5) =0.067
Method 1: Using the table to findP(Z<-1.5)
I QuestionDoesP(Z>1.5) =P(Z<-1.5)?
Method 1: Using the table to findP(Z<-1.5)
I QuestionDoesP(Z>1.5) =P(Z<-1.5)?
I AnswerYes! Because the distribution is symmetric.
P(Z>1.5)-4-2024
0.0 0.2 0.4 P(Z>1.5)
P(Z<-1.5)-4-2024
0.0 0.1 0.2 0.3 0.4 P(Z<-1.5)
P(Z<-1.5)andP(Z>1.5)-4-2024
0.0 0.1 0.2 0.3 0.4 P(Z < -1.5) and P(Z > 1.5)
So now we foundP(VTrump,GA<45%)
P(VTrump,GA<45%) =?P(VTrump,GA-μσ
<45-6010 ) =?P(Z<-1.5) =0.067 We can also findP(VTrump,GA<45%)with Rpnorm(-1.5,mean = 0 ,sd = 1 ,lower.tail = TRUE ) ## [1] 0.0668072 Let"s move on to findingP(VTrump,GA>95%)
I What if we wanted to know the probability of a GA county having a Trump vote sharebelow45%orabove95%? Let"s move on to findingP(VTrump,GA>95%)20406080100 0.00 0.01 0.02 0.03 0.04 P(V(Trump, GA)>95)
Let"s move on to findingP(VTrump,GA>95%)
P(VTrump,GA>95%) =?P(VTrump,GA-μσ
>95-6010 ) =?P(Z>3.5) =0.000233 We can also find this with R
pnorm(3.5,mean = 0 ,sd = 1 ,lower.tail = FALSE ) ## [1] 0.0002326291 Finally putting this all together
I What if we wanted to know the probability of a GA county having a Trump vote sharebelow45%orabove95%? I P(VTrump,GA>95%)orP(VTrump,GA<45%)is just:
P(VTrump,GA>95%) +P(VTrump,GA<45%) =
=0.067+0.0002=0.0672 Sampling distributions
I Recall the distinction we made between estimation of parametersandstatistics. I Parametersare unknown values that we estimate using a statistic. I Statisticsare values that are estimated fromsamples. I Sampling distributionsapproximatepopulation
distributions. Voting in
I Recall the distinction we made between estimation of parametersandstatistics. I Parametersare unknown values that we estimate using a statistic. I Statisticsare values that are estimated fromsamples. I Sampling distributionsapproximatepopulation
distributions.