Introduction to Probability Random experiment, Sample space, events, classical definition of probability, statistical regularity, field, sigma field,
Suppose we want to collect data regarding the income of college teachers under University of Calicut,, then, the totality of these teachers is our population
Introduction to Statistics, Population and Sample, Collection of Data, V K Rohatgi: An Introduction to Probability Theory and Mathematical Statistics,
Module 2: Introduction to Statistics: Nature of Statistics, probability density function ( pdf )-properties and examples, Cumulative distribution function
Module 2: Introduction to Statistics: Nature of Statistics, Bivariate random variables: Joint pmf and joint pdf , marginal and conditional probability,
and basic properties) 20 hours Module 2: Bivariate random variable, joint pmf and joint pdf , marginal and conditional probability, independence of random
29 avr 2021 · The Head, Department of Statistics, University of Calicut An Introduction to Probability Theory and Mathematical Statistics
Department of Statistics, University of Calicut Introduction to Reliability Analysis: Probability Models and Statistics Methods
A probability function P assigns a real number (the probability of E) to every event E in a sample space S P(·) must satisfy the following basic properties
only serve as an introduction to the study of Mathematical StatIstics 8'14-2 Probability Density Function (p d f ) of a Single Order Statistic
STATISTICS, BASIC CALCULUS AND PROBABILITY THEORY Time: 3 Hrs PART A If x and y are two independent random variables with joint p d f f(x,y) then
university around the world Only those Introduction to Statistics, Population and Sample, Collection of Data, Various methods of data collection, Random Variables – Discrete and Continuous, Probability Distributions – Probability Mass Function; What are the properties of a p d f of a discrete random variable? 6
on Probability Theory and Statistics Antonis Demos (Athens University of Economics and Business) of the probability theory to understand and quantify this notion The basic situation The joint pdf , fX(x), is a function with P (X ∈ A) = /
All Affiliated Colleges/SDE/Dept s/Institutions under University of Calicut 2 The Controller of CORE COURSE I: BASIC STATISTICS AND PROBABILITY Module 1: Module 1: Bivariate random variable, joint pmf and joint pdf , marginal and
BSc Statistics Complementary Basic statistics and probability 2017 BSc Statistics Calicut University, Calicut is offering Master of Science (MSc Statistics)
PDF document for free
- PDF document for free
27295_6pro_stat.pdf
Notes on Probability Theory and
Statistics
Antonis Demos
(Athens University of Economics and Business)
October 2002
2
Part I
Probability Theory
3
Chapter 1
INTRODUCTION
1.1 Set Theory Digression
Asetis defined as any collection of objects, which are calledpointsorelements. The biggest possible collection of points under consideration is called thespace, universe,oruniversal set. For Probability Theory the space is called thesample space. Asetis called asubsetof(we writeor) if every element ofis also an element of.is called aproper subsetof(we writeor ) if every element ofis also an element ofand there is at least one element ofwhich does not belong to. Two setsandare calledequivalent setsorequal sets(we write=) ifand. If a set has no points, it will be called theemptyornullset and denoted by . Thecomplementof a setwith respect to the space, denoted by¯ , , or, is the set of all points that are in but not in. Theintersectionof two setsandis a set that consists of the common elements of the two sets and it is denoted byor. Theunionof two setsandis a set that consists of all points that are in oror both (but only once) and it is denoted by. Theset dierenceof two setsandis a set that consists of all points in
6Introduction
that are not inand it is denoted by.
Properties of Set Operations
Commutative:=and=.
Associative:()=()and()=().
Distributive:()=()()and()=
()(). ( ) = ¡¯¢=i.e. the complement of the-complement is.
Ifsubset of(the space) then:=,=,=,
=, =,=,=,and=.
De Morgan Law:
()=,and()=. Disjoint or mutually exclusivesets are the sets that their intersection is the empty set, i.e.andare mutually exclusive if=. Subsets 1 2 are mutually exclusive if =for any6=. Uncertainty or variability are prevalent in many situations and it is the purpose of the probability theory to understand and quantify this notion. The basic situation is an experiment whose outcome is unknown before it takes place e.g., a) coin tossing, b)throwingadie,c)choosingatrandomanumberfromN,d)choosingatrandoma number from(01). Thesample spaceis the collection or totality of all possible outcomes of a conceptual experiment. Aneventis a subset of the sample space. The class of all events associated with a given experiment is defined to be theevent space. Let us describe the sample space, i.e. the set of all possible relevant outcomes of the above experiments, e.g.,={}={123456}In both of these examples we have afinite sample space. In example c) the sample space is a countable infinity whereas in d) it is an uncountable infinity. Classical or a priori Probability: If a random experiment can result in mutually exclusive and equally likely outcomes and if()of these outcomes have an attribute,thentheprobabilityofis the fraction()i.e.()=(),
Set Theory Digression7
where=()+(). Example:Consider the drawing an ace (event) from a deck of 52 cards.
What is()?
We have that()=4and(
)=48.Then=()+()=4+48=
52and()=
() = 4 52
Frequency or a posteriori Probability: Is the ratio of the numberthat an eventhas occurred out oftrials, i.e.()=. Example:Assume that weflip a coin 1000 times and we observe 450 heads. Then the a posteriori probability is()==4501000 = 045(this is also the relative frequency). Notice that the a priori probability is in this case 0.5. Subjective Probability: This is based on intuition or judgment. We shall be concerned with a priori probabilities. These probabilities involve, many times, the counting of possible outcomes.
1.1.1 Some Counting Problems
Some more sophisticated discrete problems require counting techniques. For example: a) What is the probability of getting four of a kind in afive card poker? b) What is the probability that two people in a classroom have the same birthday? The sample space in both cases, although discrete, can be quite large and it not feasible to write out all possible outcomes.
1. Duplication is permissible and Order is important (Multiple
Choice Arrangement), i.e. the elementis permitted andis a dierent element from. In this case where we want to arrangeobjects inplaces the possibleoutcomesisgivenfrom: = . Example:Find all possible combinations of the letters A, B, C, and D when duplication is allowed and order is important. Theresultaccordingtotheformulais:=4,and=2, consequently the
8Introduction
possible number of combinations is 42
=4 2 =16.Tofind the result we can also use atreediagram.
2. Duplication is not permissible and Order is important (Per-
mutation Arrangement), i.e. the elementisnotpermitted andis a dierent element from. In this case where we want to permuteobjects in places the possible outcomes is given from: ()=×(1)×(+1)=! ()! Example:Find all possible permutations of the letters A, B, C, and D when duplication is not allowed and order is important. Theresultaccordingtotheformulais:=4,and=2, consequently the possible number of combinations is 42
= 4! (42)! = 234
2 =12.
3. Duplication is not permissible and Order is not important
(Combination Arrangement), i.e. the elementisnotpermitted andis notadierent element from. In this case where we want the combinations of objects inplaces the possible outcomes is given from: ()=() !=!()!!=µ ¶ Example:Find all possible combinations of the letters A, B, C, and D when duplication is not allowed and order is not important. Theresultaccordingtotheformulais:=4,and=2, consequently the possible number of combinations is 42
= 4!
2!(42)!
= 234
22
=6.
Let us now define probability rigorously.
1.1.2 Definition of Probability
Consider a collection of sets
with index, which is denoted by{ :}. We can define for an indexof arbitrary cardinality (the cardinal number of a set is the number of elements of this set): ={: for some}
Set Theory Digression9
={: for all}
A collection is exhaustive if
=(partition), and is pairwise exclusive or disjoint if =,6=. To define probabilities we need some further structure. This is because in uncountable cases we can not just define probability for all subsets of,asthere are some sets on the real line whose probability can not be determined, i.e., they are unmeasurable. We shall define probability on a family of subsets of,ofwhichwe require the following structure. Definition 1Let beAa non-empty class of subsets of.Ais an algebra if 1.
A, wheneverA
2. 1 2
A,whenever
1 2 A.
Ais a-algebra if also
2 . =1
A, whenever
A, n=1,2,3,...
Note that sinceAis non-empty, (1) and (2)AandA.Notealso that =1 A.Thelargest-algebra is the set of all subsets of, denoted by P(), and the smallest is{}.Wecangeneratea-algebra from any collection of subsets by adding to the set the complements and the unions of its elements. For example let=R,and
B={[](][)()R}
and letA=(B)consists of all intervals and countable unions of intervals and complements thereof. This is called the Borel-algebra and is the usual-algebra we work when=R.The-algebraAP(R), i.e., there are sets inP(R)not inA. These are some pretty nasty ones like the Cantor set. We can alternatively construct the Borel-algebra by consideringJthe set of all intervals of the form (],R. We can prove that(J)=(
B).Wecannowgivethedefinition
of probability measure which is due to Kolmogorov.
10Introduction
Definition 2Given a sample spaceand a-algebra(A), a probability measure is a mapping fromARsuch that
1.()0for allA
2.()=1
3. if 1 2 are pairwise disjoint, i.e., =for all6=,then à [ =1 ! = X =1 ( ) Insuchawaywehaveaprobabilityspace(A).Whenis discrete we usually takeA=P().When=Ror some subinterval thereof, we takeA=(B). is a matter of choice and will depend on the problem. In many discrete cases, the problem can usually be written such that outcomes are equally likely. ({})=1 =() In continuous cases,is usually like Lebesgue measure, i.e., (())
Properties of
1.()=0
2.()1 3.( )=1() 4.( )=()()
5. If()()
6.()=+()+()()More generally, for events
1 2
Awe have:
" [ =1 # = X =1 [ ]XX [ ]+XXX [ ]+(1) +1 [ 1 ]
For=3the above formula is:
h 1 [ 2 [ 3 i =[ 1 ]+[ 2 ]+[ 3 ][ 1 2 ][ 1 3 ][ 2 3 ]+[ 1 2 2 ]
Conditional Probability and Independence11
7.( =1 )P =1 ( ) Proofs involve manipulating sets to obtain disjoint sets and then apply the axioms.
1.2 Conditional Probability and Independence
In many statistical applications we have variablesand(or eventsand) and want to explain or predictorfromor, we are interested not only in marginal probabilities but in conditional ones as well, i.e., we want to incorporate some information in our predictions. Letandbetwoevents inAand a probability function().Theconditional probabilityofgiven event,isdenotedby [|]and is defined as follows: Definition 3The probability of an eventgivenanevent, denoted by(|), is given by ([|)=() () ()0 and is left undefined if()=0
From the above formula is evident[]=[|][]=[|][]if
both[]and[]are nonzero. Notice that when speaking of conditional proba- bilities we are conditioning on some given event; that is, we are assuming that the experiment has resulted in some outcome in.,ineect then becomes our "new" sample space. All probability properties of the previous section apply to conditional probabilities as well, i.e.(·|)is a probability measure. In particular:
1.(|)0
2.(|)=1
3.( =1 |)=P =1 ( |)for any pairwise disjoint events{ } =1 . Note that ifandare mutually exclusive events,(|)=0.When (|)= () () ()with strict inequality unless()=1.When (|)=1.
12Introduction
However, there is an additional property (Law) called theLaw of Total
Probabilitieswhich states that:
LAW OF TOTAL PROBABILITY:
()=()+( )
For a given probability space(A[]),if
1 2 is a collection of mutually exclusive events inAsatisfying S =1 =and[ ]0for=12then for everyA, []= X =1 [| ][ ] Another important theorem in probability is the so calledBayes' Theorem which states:
BAYES RULE: Given a probability space(A[]),if
1 2 is a collection of mutually exclusive events inAsatisfying S =1 =and[ ]0for =12then for everyAfor which[]0we have: [ |]=[| ][ ] P =1 [| ][ ] Notice that for eventsandAwhich satisfy[]0and[]0we have: (|)=(|)() (|)()+(| )( ) This follows from the definition of conditional independence and the law of total probability. The probability()is a prior probability and(|)frequently is a likelihood, while(|)is the posterior.
Finally theMultiplication Rulestates:
Given a probability space(A[]),if
1 2 are events inAfor which[ 1 2 1 ]0then:
Conditional Probability and Independence13
[ 1 2 ]=[ 1 ][ 2 | 1 ][ 3 | 1 2 ][ | 1 2 1 ] Example:A plant has two machines. Machine A produces 60% of the total output with a fraction defective of 0.02. Machine B the rest output with a fraction defective of 0.04. If a single unit of output is observed to be defective, what is the probability that this unit was produced by machine A? Ifis the event that the unit was produced by machine A,the event that the unit was produced by machine B andthe event that the unit is defective. Then we ask what is[|].But[|]= [] [] .Now[]=[|][]=002
06=0012.Also[]=[|][]+[|][]=0012 + 00404=0028.
Consequently,[|]=0571.Noticethat[|]=1[|]=0429.Wecan also use a tree diagram to evaluate[]and[]. Example:A marketing manager believes the market demand potential of a new product to be high with a probability of 0.30, or average with probability of 0.50, or to be low with a probability of 0.20. From a sample of 20 employees, 14 indicated a very favorable reception to the new product. In the past such an employee response (14 out of 20 favorable) has occurred with the following probabilities: if the actual demand is high, the probability of favorable reception is 0.80; if the actual demand is average, the probability of favorable reception is 0.55; and if the actual demand is low, the probability of the favorable reception is 0.30. Thus given a favorable reception, what is the probability of actual high demand?
Againwhatweaskis[|]=
[] [] .Now[]=[][|]+[][|]+ [][|]=024+0275+006 = 0575.Also[]=[|][]=024. Hence [|]= 024
0575
=04174 Example:There arefive boxes and they are numbered 1 to 5. Each box contains 10 balls. Boxhasdefective balls and10non-defective balls,=125. Consider the following random experiment: First a box is selected at random, and then a ball is selected at random from the selected box. 1) What is the probability
14Introduction
that a defective ball will be selected? 2) If we have already selected the ball and notedthatisdefective,whatistheprobabilitythatitcamefromthebox5? Letdenote the event that a defective ball is selected and the event that box i is selected,=125.Notethat[ ]=15,for=125,and [| ]=10.Question1)iswhatis[]? Using the theorem of total probabilities we have: []= 5 P =1 [| ][ ]= 5 P =1 515
=310. Noticethatthetotalnumberof defectiveballsis15outof50.Henceinthiscasewecansaythat[]= 15 50
=310. This is true as the probabilities of choosing each of the 5 boxes is the same. Question
2) asks what is[
5 |]. Since box 5 contains more defective balls than box 4, which contains more defective balls than box 3 and so on, we expect tofind that[ 5 |] [ 4 |][ 3 |][ 2 |][ 1 |]. We apply Bayes' theorem: [ 5 |]=[| 5 ][ 5 ] 5 P =1 [| ][ ]= 1 215
3 10 =1 3
Similarly[
|]= [| ][ ] 5 P =1 [| ][ ] = 1015
3 10 = 15 for=125.Noticethatuncon- ditionally all 0 were equally likely. Letandbe two events inAand a probability function().Events andare definedindependentif and only if one of the following conditions is satisfied: (i)[]=[][]. (ii)[|]=[]if[]0. (iii)[|]=[]if[]0. These are equivalent definitions except that()does not really require(), ()0.Noticethat the property of two eventsand and the property that andare mutually exclusive are distinct, though related properties. We know that ifandare mutually exclusive then[]=0. Now if these events are
Conditional Probability and Independence15
also independent then[]=[][], and consequently[][]=0,which means that either[]=0or[]=0. Hence two mutually exclusive events are independent if[]=0or[]=0. On the other hand if[]6=0and[]6=0, then ifandare independent can not be mutually exclusive and oppositely if they are mutually exclusive can not be independent. Also notice that independence is not transitive, i.e.,independent ofandindependent ofdoes not imply that is independent of. Example:Consider tossing two dice. Letdenote the event of an odd total, the event of an ace on thefirst die, andthe event of a total of seven. We ask the following: (i) Areandindependent? (ii) Areandindependent? (iii) Areandindependent? (i)[|]=12,[]=12hence[|]=[]and consequentlyand are independent. (ii)[|]=16=[]=12henceandare not independent. (iii)[|]=16=[]=16henceandare independent. Notice that althoughandare independent andandare independent andare not independent. Let us extend the independence of two events to several ones:
For a given probability space(A[]),let
1 2 be n events inA.
Events
1 2 are defined to beindependentif and only if: [ ]=[ ][ ]for6= [ ]=[ ][ ][ ]for6=6=6= and so on [ T =1 ]= Q =1 [ ] Notice that pairwise independence does not imply independence, as the fol- lowing example shows.
16Introduction
Example:Consider tossing two dice. Let
1 denote the event of an odd face in thefirst die, 2 the event of an odd face in the second die, and 3 the event of an odd total. Then we have:[ 1 ][ 2 ]= 1 212
=[ 1 2 ][ 1 ][ 3 ]= 1 212
= [ 3 | 1 ][ 1 ]=[ 1 3 ]and[ 2 3 ]= 1 4 =[ 2 ][ 3 ]hence 1 2 3 are pairwise independent. However notice that[ 1 2 3 ]=06= 1 8 =[ 1 ][ 2 ][ 3 ]. Hence 1 2 3 arenotindependent.
Chapter 2
RANDOM VARIABLES, DISTRIBUTION FUNCTIONS, AND
DENSITIES
The probability space(A)isnotparticularlyeasytoworkwith.Inpractice,we often need to work with spaces with some structure (metric spaces). It is convenient therefore to work with a cardinalization ofby using the notion of random variable. Formally, a random variableis just a mapping from the sample space to the real line, i.e., :R with a certain property, it is a measurable mapping, i.e. A ={:()B}=© 1 ():BªA whereBis a sigma-algebra onR, for anyinBthe inverse image belongs toA.The probability measure canthenbedefined by ()=¡ 1 ()¢
It is straightforward to show thatA
is a-algebra wheneverBis. Therefore, is a probability measure obeying Kolmogorov'saxioms. Hencewehavetransferred (A)(RB ),whereBis the Borel-algebra when()=Ror any uncountable set, andBisP(())when()isfinite. The function()must be such that the set ,defined by ={:()}, belongs toAfor every real number,aselementsofBare left-closed intervals ofR.
18Random Variables, Distribution Functions, and Densities
The important part of the definition is that in terms of a random experiment, is the totality of outcomes of that random experiment, and the function, or random variable,()with domainmakes some real number correspond to each outcome of the experiment. The fact that we also require the collection of0for which ()to be an event (i.e. an element ofA) for each real numberis not much of a restriction since the use of random variables is, in our case, to describe only events. Example:Consider the experiment of tossing a single coin. Let the random variabledenote the number of heads. In this case={ },and()=1 if=,and()=0if=. So the random variableassociates a real number with each outcome of the experiment. To show thatsatisfies the definition we should show that{:()}, belongs toAfor every real number .A={{}{}}.Nowif0, {:()}=,if01then {:()}={},andif1then{:()}={}=. Hence, for eachthe set{:()}belongs toAand consequently()is a random variable. In the above example the random variable is described in terms of the random experiment as opposed to its functional form, which is the usual case.
We can now work with(RB
), which has metric structure and algebra. Forexample,wetosstwodieinwhichcasethesamplespaceis ={(11)(12)(66)} We can define two random variables: the Sum and Product: ()={23456789101112} ()={123456891036} The simplest form of random variables are the indicators ()= 1 0 Random Variables, Distribution Functions, and Densities19
This has associated sigma algebra in
{ } Finally,wegiveformaldefinition of a continuous real-valued random variable. Definition 4A random variable is continuous if its probability measure is ab- solutely continuous with respect to Lebesgue measure, i.e., ()=0whenever ()=0
2.0.1 Distribution Functions
Associated with each random variable there is the distribution function ()= () defined for allR. This function eectively replaces . Notethatwecan reconstruct from .
EXAMPLE.={}()=1()=0(=12).
If0, ()=0 If01, ()=12 If1, ()=1
EXAMPLE. The logit c.d.f. is
()=1 1+ It is continuous everywhere and asymptotes to 0 and 1 at±respectively. Strictly increasing.
Note that the distribution function
()of a continuous random variable is a continuous function. The distribution function of a discrete random variable is a step function. Theorem 5Afunction(·)is a c.d.f. of a random variableif and only if the following three conditions hold
20Random Variables, Distribution Functions, and Densities
1.lim ()=0andlim ()=1
2.is a nondecreasing function in
3.is right-continuous, i.e., for all
0 lim + 0()=( 0 )
4.is continuous except at a set of points of Lebesgue measure zero.
2.0.2 Discrete Random Variables.
As we have already said, a random variablewill be defined to bediscreteif the range ofis countable. If a random variableis discrete, then its corresponding cumulative distribution function ()will be defined to bediscrete, i.e. a step function. By the range ofbeing countable we mean that there exists afinite or denumerable set of real numbers, say 1 2 ,suchthattakes on values only in that set. Ifis discrete with distinct values 1 2 ,then=S {: ()= }and{= }{= }=for6=. Hence1=[]=P [= ]by the third axiom of probability. Ifis a discrete random variable with distinct values 1 2 , then the function, denoted by ()and defined by ()= [=] = =12 0 6= is defined to be thediscrete density functionof. Notice that the discrete density function tell us how likely or probable each of the values of a discrete random variable is. It also enables one to calculate the probability of events described in terms of the discrete random variable. Also notice that for any discrete random variable, ()can be obtained from (),andvice versa Example:Consider the experiment of tossing a single die. Letdenote the number of spots on the upper face. Then for this case we have: takes any value from the set{123456}.Sois a discrete ran- dom variable. The density function ofis: ()=[=]=16for any Random Variables, Distribution Functions, and Densities21 {123456}and0otherwise. The cumulative distribution function ofis: ()=[]= [] P =1 [=]where[]denotes the integer part of.. Notice thatcan be any real number. However, the points of interest are the elements of {123456}.Noticealsothatinthiscase={123456}as well, and we do not need any reference toA. Example:Consider the experiment of tossing two dice. Letdenote the total of the upturned faces. Then for this case we have: ={(11)(12)(16)(21)(22)(26)(31)(66)}a total of (us- ing the Multiplication rule)36 = 6 2 elements.takesvaluesfromtheset{23456789101112
The density function is:
()=[=]=
136 =2 =12
236 =3 =11
336 =4 =10
436 =5 =9
536 =6 =8
136 =7
0
The cumulative distribution function is:
()=[]= [] P =1 [=]= 0 2 1 36
23
3 36
34
6 36
45
10 36
56
35
36
1112
112
Notice that, again, we do not need any reference toA. In fact we can speak of discrete density functions without reference to some random variable at all.
22Random Variables, Distribution Functions, and Densities
Any function()with domain the real line and counterdomain[01]is defined to be adiscrete density functionif for some countable set 1 2 has the following properties: i)( )0for=12 ii)()=0for6= ;=12 iii) P( )=1, where the summation is over the points 1 2
2.0.3 Continuous Random Variables
A random variableis calledcontinuousif there exist a function ()such that ()= R ()for every real number.Insuchacase ()is thecumulative distributionand the function ()is thedensity function. Notice that according to the above definition the density function is not uniquely determined. The idea is that if the a function change value if a few points its integral is unchanged. Furthermore, notice that ()= (). The notations for discrete and continuous density functions are the same, yettheyhavedierent interpretations. We know that for discrete random variables ()=[=], which is not true for continuous random variables. Further- more, for discrete random variables ()is a function with domain the real line and counterdomain the interval[01], whereas, for continuous random variables ()is a function with domain the real line and counterdomain the interval[0).Notethat for a continuous r.v. (=)()= () ()0 as0, by the continuity of (). The set{=}is an example of a set of measure (in this case the measure isor ) zero. In fact, any countable set is of measure zero under a distribution which is absolutely continuous with respect to Lebesgue measure. Because the probability of a singleton is zero ()=()=() for any,. Random Variables, Distribution Functions, and Densities23 Example:Letbe the random variable representing the length of a tele- phone conversation. One could model this experiment by assuming that the distrib- ution ofis given by ()=(1 )whereis some positive number and the random variable can take values only from the interval[0). The density function is ()= ()= . If we assume that telephone conversations are mea- sured in minutes,[510] =R 10 5 ()=R 10 5 = 5 10 ,and for=15we have that[510] = 1 2 =023. The example above indicates that the density functions of continuous random variables are used to calculate probabilities of events defined in terms of the corre- sponding continuous random variablei.e.[]=R ().Againwe can give the definition of the density function without any reference to the random variable i.e. any function()with domain the real line and counterdomain[0)is defined to be aprobability density function (i)()0for all (ii)R ()=1. In practice when we refer to the certain distribution of a random variable, we state its density or cumulative distribution function. However, notice that not all random variables are either discrete or continuous.
24Random Variables, Distribution Functions, and Densities
Chapter 3
EXPECTATIONS AND MOMENTS OF RANDOM VARIABLES
An extremely useful concept in problems involving random variables or distributions is that of expectation.
3.0.4 Mean or Expectation
Letbe a random variable. Themeanor theexpected valueof,denotedby []or ,isdefined by: (i)[]=P [= ]=P ( ) ifis a discrete random variable with counterdomain the countable set { 1 } (ii)[]=R () ifis a continuous random variable with density function ()and if either¯¯R 0 ()¯¯or¯¯¯R 0 ()¯¯¯or both. (iii)[]=R 0 [1 ()]R 0 () for an arbitrary random variable. (i) and (ii) are used in practice tofind the mean for discrete and continuous random variables, respectively. (iii) is used for the mean of a random variable that is neither discrete nor continuous. Notice that in the above definition we assume that the sum and the integrals exist. Also that the summation in (i) runs over the possible values ofand the term is the value of the random variable multiplied by the probability that the random variable takes this value. Hence[]is an average of the values that the
26Expectations and Moments of Random Variables
random variable takes on, where each value is weighted by the probability that the random variable takes this value. Values that are more probable receive more weight. Thesameistrueintheintegralformin(ii).Therethevalueis multiplied by the approximate probability thatequals the value, i.e. (), and then integrated over all values. Noticethat in the definition of a mean of a random variable, only density functions or cumulative distributions were used. Hence we have really defined the mean for these functions without reference to random variables. We then call the defined mean the mean of the cumulative distribution or the appropriate density function. Hence, we can speak of the mean of a distribution or density function as well as the mean of a random variable. Noticethat[]is the center of gravity (or centroid) of the unit mass that is determined by the density function of.Sothemeanofis a measure of where the values of the random variable are centered or located i.e. is a measure of central location. Example:Consider the experiment of tossing two dice. Letdenote the total of the upturned faces. Then for this case we have: []= 12 P =2 ()=7 Example:Consider athat can take only to possible values, 1 and -1, each with probability 0.5. Then the mean ofis: []=105+(1)05=0 Noticethatthemeaninthiscaseisnotoneofthepossiblevaluesof. Example:Consider a continuous random variablewith density function ()= for[0).Then []= R ()= R 0 =1 Example:Consider a continuous random variablewith density function ()= 2 for[1).Then []= R ()= R 1 2 =lim log=
Expectations and Moments of Random Variables27
so we say that the mean does not exist, or that it is infinite.
Median of:When
is continuous and strictly increasing, we can define the median of,denoted(),asbeingtheuniquesolutionto ()=1 2
Since in this case,
1 (·)exists, we can alternatively write= 1 ( 1 2 ). For discrete r.v., there may be many m that satisfy this or may none. Suppose = 013 113
213
then there does not exist anwith ()= 1 2 . Also, if = 014 114
214
314
then any12is an adequate median.
Note that if(
)exists, then so does( 1 )but not vice versa(0). Also when the support is infinite, the expectation does not necessarily exist. IfR 0 ()=butR 0 () ,then()= IfR 0 ()=andR 0 ()=,then()is not defined.
Example:[Cauchy]
()= 1 11+ 2 . This density function is symmetric about zero, and one is temted to say that()=0.ButR 0 ()=andR 0 ()=,so()does not exist according to the above definition. Now consider=(), where g is a (piecewise) monotonic continuous func- tion. Then ()=Z ()=Z () ()=(())
28Expectations and Moments of Random Variables
Theorem 6Expectation has the following properties:
1. [Linearity](
1 1 ()+ 2 2 ()+ 3 )= 1 ( 1 ()) + 2 ( 2 ()) + 3
2. [Monotonicity] If
1 () 2 ()( 1 ())( 2 ())
3. Jensen's inequality. If()is a weakly convex function, i.e.,(+(1))
()+(1)()for all,, and all with01then (())(())
An Interpretation of Expectation
We claim that()is the unique minimizer of()
2 with respect to, assum- ing that the second moment ofisfinite.
Theorem 7Suppose that(
2 )exists and isfinite, then()is the unique min- imizer of() 2 with respect to. This Theorem says that the Expectation is the closest quantity to,inmean square error.
3.0.5 Variance
Letbe a random variable and
be[].Thevarianceof X, denoted by 2 or [],isdefined by: (i)[]=P( ) 2 [= ]=P( ) 2 ( ) ifis a discrete random variable with counterdomain the countable set { 1 } (ii)[]=R ( ) 2 () ifis a continuous random variable with density function (). (iii)[]=R 0 2[1 ()+ ()] 2 for an arbitrary random variable. Thevariancesaredefined only if the series in (i) is convergent or if the integrals in (ii) or (iii) exist. Again, the variance of a random variable is definedintermsof
Expectations and Moments of Random Variables29
the density function or cumulative distribution function of the random variable and consequently, variance can be defined in terms of these functions without reference to a random variable. Notice that variance is a measure of spread since if the values of the random variabletend to be far from their mean, the variance ofwill be larger than the variance of a comparable random variable whose values tend to be near their mean. It is clear from (i), (ii) and (iii) that the variance is a nonnegative number.
Ifis a random variable with variance
2 , then thestandard deviationof ,denotedby ,isdefined asp() The standard deviation of a random variable, like the variance, is a measure of spread or dispersion of the values of a random variable. In many applications it is preferable to the variance since it will have the same measurement units as the random variable itself. Example:Consider the experiment of tossing two dice. Letdenote the total of the upturned faces. Then for this case we have( =7): []= 12 P =2 ( ) 2 ()=21036 Example:Consider athat can take only to possible values, 1 and -1, each with probability 0.5. Then the variance ofis( =0): []=051 2 +05(1) 2 =1 Example:Consider athat can take only to possible values, 10 and -10, each with probability 0.5. Then we have: =[]=1005+(10)05=0 []=0510 2 +05(10) 2 =100 Notice that in examples 2 and 3 the two random variables have the same mean but dierent variance, larger being the variance of the random variable with values further away from the mean. Example:Consider a continuous random variablewith density function ()= for[0).Then( =1):
30Expectations and Moments of Random Variables
[]= R ( ) 2 ()= R 0 (1) 2 = 1 2 Example:Consider a continuous random variablewith density function ()= 2 for[1). Then we know that the mean ofdoes not exist.
Consequently, we can not define the variance.
Notice that
()=£(()) 2
¤=¡
2 ¢ 2 () and that (+)= 2 ()= (+)=||() i.e.,()changes proportionally. Variance/standard deviation measures disper- sion, higher variance more spread out. Interquartile range: 1 (34) 1 (14),the range of middle half always exists and is an alternative measure of dispersion.
3.0.6 Higher Moments of a Random Variable
Ifis a random variable, ther
th raw momentof, denoted by ,isdefined as: =[ ] if this expectation exists. Notice that =[]= 1 = , the mean of.
Ifis a random variable, ther
th central momentofaboutis defined as[() ]If= ,wehavether th central momentofabout ,denoted by ,whichis: =[( ) ] We have measures defined in terms of quantiles to describe some of the char- acteristics of random variables or density functions. Theq th quantileof a random variableor of its corresponding distribution is denoted by and is defined as the smallest numbersatisfying ().Ifis a continuous random variable, then theq th quantileofis given as the smallest numbersatisfying ().
Expectations and Moments of Random Variables31
Themedianof a random variable, denoted by
or()or ,is the 0.5 quantile. Notice that ifa continuous random variable the median of satisfies: Z () ()=1 2=Z () () so the median ofis any number that has half the mass ofto its right and the other half to its left. The median and the mean are measures of central location.
The third moment about the mean
3 =(()) 3 is called a measure of asymmetry, orskewness. Symmetrical distributions can be shown to have 3 =0. Distributions can be skewed to the left or to the right. However, knowledge of the third moment gives no clue as to the shape of the distribution, i.e. it could be the case that 3 =0but the distribution to be far from symmetrical. The ratio 3 3 is unitless and is call thecoecient of skewness. An alternative measure of skewness is provided by the ratio: (mean-median)/(standard deviation)
The fourth moment about the mean
4 =(()) 4 is used as a measure ofkurtosis,whichisadegreeofflatness of a density near the center. Thecoecient of kurtosisis defined as 4 4
3and positive values are sometimes used to indicate
that a density function is more peaked around its center than the normal (leptokur- tic distributions). A positive value of the coecient of kurtosis is indicative for a distribution which isflatter around its center than the standard normal (platykurtic distributions). This measure suers from the same failing as the measure of skewness i.e. it does not always measure what it supposed to. While a particular moment or a few of the moments may give little information about a distribution the entire set of moments will determine the distribution exactly. In applied statistics thefirsttwomomentsareofgreatimportance,butthethirdand forth are also useful.
32Expectations and Moments of Random Variables
3.0.7 Moment Generating Functions
Finally we turn to the moment generating function (mgf) and characteristic Function (cf). The mgf is defined as ()=¡
¢=Z
() for any real, provided this integral exists in some neighborhood of 0. It is the
Laplace transform of the function
(·)with argument. We have the useful inversion formula ()=Z () The mgf is of limited use, since it does not exist for many r.v. theis applicable more generally, since it always exists: ()=¡
¢=Z
()=Z cos() ()+Z sin() () This essentially is the Fourier transform of the function (·)and there is a well defined inversion formula ()=1 2Z () Ifis symmetric about zero, the complex part ofis zero. Also, (0) =¡ ¢ =0 = ( )=123 Thus the moments ofare related to the derivative of theat the origin. If ()=Z exp()() notice that () =Z () exp()() and ()
¯¯¯¯
=0 =Z () ()=() =() ()
¯¯¯¯
=0
Expectations and Moments of Random Variables33
the uncenterd moment. Now expanding()in powers ofwe get ()=(0)+ ()
¯¯¯¯
=0 ++ ()
¯¯¯¯
=0 () !+=1+ 1 ()++ () !+
The cummulants are defined as the coecients
1 , 2 of the identity in exp à 1 ()+ 2 () 2 2!++ () !+! =1+ 1 ()++ () !+ =()=Z exp()()
The cumulant-moment connection:
Supposeis a random variable withmoments
1
Thenhas
cumulants 1 and +1 = X =0 +1 for=01
Writing out for=03produces:
1 = 1 2 = 2 + 1 1 3 = 3 +2 1 2 + 2 1 4 = 4 +3 1 3 +3 2 2 + 3 1 These recursive formulas can be used to calculate the0eciently from the
0, and vice versa. Whenhas mean0, that is, when
1 =0= 1 becomes =((()) ) so the above formulas simplify to: 2 = 2 3 = 3 4 = 4 +3 22
34Expectations and Moments of Random Variables
3.0.8 Expectations of Functions of Random Variablers
Product and Quotient
Let()=
\ ,()= and()= . Then, expanding()= \ around( ),wehave ()= +1 ( ) ( ) 2 ( )+ ( ) 3 ( ) 2 1 ( ) 2 ( )( ) as C [ = 1 , C \ = \ 2 , 2 C [ 2 =0, 2 C [ C \ = 2 C \ C [ = 1 2 ,and 2 C \ 2 =2 \ 3 .Taking expectations we have µ \¶ = + ( ) 3 ()1 ( ) 2 () For the variance, take again the variance of the Taylor expansion and keeping only terms up to order 2 we have: µ \¶ =( ) 2 ( ) 2 () ( ) 2 +() ( ) 2 2() ¸
Chapter 4
EXAMPLES OF PARAMETRIC UNIVARIATE DISTRIBUTIONS
A parametric family of density functions is a collection of density functions that are indexed by a quantity called parameter, e.g. let(;)= for0and some
0.is the parameter, and asranges over the positive numbers, the collection
{(;):0}is a parametric family of density functions.
4.0.9 Discrete Distributions
UNIFORM:
Suppose that for=123
(= |X)=1 where{ 1 2 }=Xis the support. Then ()=1 X =1 ()=1 X =1 2 Ã 1 X =1 ! 2
The c.d.f. here is
()=1 X =1 1( )
Bernoulli
A random variable whose outcome have been classified into two categories, called "success" and "failure", represented by the letters s and f, respectively, is called a Bernoulli trial. If a random variableis defined as 1 if a Bernoulli trial results in
36Examples of Parametric Univariate Distributions
success and 0 if the same Bernoulli trial results in failure, thenhas a Bernoulli distribution with parameter=[].Thedefinition of this distribution is: A random variablehas a Bernoulli distribution if the discrete density of is given by: ()= (;)= (1) 1 =01 0 where=[=1].Fortheabovedefined random variablewe have that: []=[]=(1)
BINOMIAL:
Consider a random experiment consisting ofrepeated independent Bernoulli trials withthe probability of success at each individual trial. Let the random variable represent the number of successes in therepeated trials. Thenfollows a Binomial distribution. The definition of this distribution is:
A random variablehas abinomialdistribution,(),if
the discrete density ofis given by: ()= (;)= (1) =01 0 where=[=1]i.e. the probability of success in each independent Bernoulli trial andis the total number of trials. For the above defined random variable we have that: []= []=(1) Mgf ()=£ +(1)¤ Example:Consider a stock with value=50. Each period the stock moves up or down, independently, in discrete steps of5. The probability of going up is
Examples of Parametric Univariate Distributions37
=07and down1=03.Whatistheexpectedvalueandthevarianceofthe value of the stock after 3 period? If we callthe random variable which is a success if the stock moves up and failure if the stock moves down. Then[=]=[=1]=07,and ˜(3).Nowcan take the values0123i.e. no success, 1 success and
2 failures, etc.. The value of the stock in each case and the probabilities are:
=35,and (0) = 3 0 0 (1) 30
=103 3 =0027 =45,and (1) = 3 1 1 (1) 31
=30703 2 =0189 =55,and (2) = 3 2 2 (1) 32
=307 2
03=0441,
=65and (3) = 3 3 3 (1) 33
=107 3 =0343.
Hence the expected stock value is:
[]=350027 + 450189 + 550441 + 650343 = 56,and[]= (3556) 2
0027 + (11)
2
0189 + (1)
2
0441 + (9)
2 0343
Hypergeometric
Letdenote the number of defective balls in a sample of sizewhen sampling is done withoutreplacement from a box containingballs out of whichare defective. Thehas a hypergeometric distribution. The definition of this distribution is: A random variablehas ahypergeometricdistribution if the discrete den-
38Examples of Parametric Univariate Distributions
sity ofis given by: ()= (;)= 3 E E E C =01 0 whereis a positive integer,is a nonnegative that is at most,andis a positive integer that is at most. For this distribution we have that: []= P d q g y d u[]= P P P P P1 Notice the dierence of the binomial and the hypergeometric i.e. for the binomial distribution we have Bernoulli trials i.e. independent trials withfixed probability of success or failure, whereas in the hypergeometric in each trial the probability of success or failure changes depending on the result.
Geometric
Consider a sequence of independent Bernoulli trials withequal the probability of success on an individual trial. Let the random variablerepresent the number of trials required before thefirst success. Thenhas a geometric distribution. The definition of this distribution is: A random variablehas ageometricdistribution, (), if the discrete density ofis given by: ()= (;)= (1) =01 0 whereis the probability of success in each Bernoulli trial. For this distribution we have that: []=1 s d q g y d u[]=1 s 2
Examples of Parametric Univariate Distributions39
It is worth noticing that the Binomial distribution()can be approximated by a()(see below). The approximation is more valid as
0in such a way so that=.
POISSON:
Arandomvariablehas aPoissondistribution,(), if the discrete density ofis given by: (=|)= !=0123 In calculations with the Poisson distribution we may use the fact that = X =0 !
Employing the above we can prove that
()= ((1)) = 2 ()= The Poisson distribution provides a realistic model for many random phenomena. Since the values of a Poisson random variable are nonnegative integers, any random phenomenon for which a count of some sort is of interest is a candidate for modeling in assuming a Poisson distribution. Such a count might be the number of fatal trac accidents per week in a given place, the number of telephone calls per hour, arriving in a switchboard of a company, the number of pieces of information arriving per hour, etc. Example:It is known that the average number of daily changes in excess of 1%, for a specific stock Index, occurring in each six-month period is 5. What is the probability of having one such a change within the next 6 months? What is the probability of at least 3 changes within the same period? We model the number of in excess of 1% changes,, within the next 6 months as a Poisson process. We know that[]==5. Hence ()= ! = 5 5 ! ,
40Examples of Parametric Univariate Distributions
for=012Then[=1]= (1) = 5 5 1 1! =00337.Also[3] =
1[3] =
=1[=0][=1][=2]= =1 5 5 0 0! 5 5 1 1! 5 5 2 2! =0875 We can approximate the Binomial with Poisson. The approximation is better the smaller theand the larger the.
4.0.10 Continuous Distributions
UNIFORM ON[]
A very simple distribution for a continuous random variable is the uniform distribu- tion. Its density function is: (|)= 1 [] 0 and (|)=Z (|)= e where. Then the random variableis defined to beuniformly distributed over the interval[].Nowifis uniformly distributed over[]then ()=+
2 =+2()=()
2 12
Ifv[]=v[0]=
e v[0].Noticethatif a random variable is uniformly distributed over one of the following intervals[), (],()the density function, expected value and variance does not change.
Exponential Distribution
If a random variablehas a density function given by: ()= (;)= 0
Examples of Parametric Univariate Distributions41
where0thenis defined to have an (negative) exponential distribution. Now this random variablewe have []=1 []=1 2
Pareto-Levy or Stable Distributions
The stable distributions are a natural generalization of the normal in that, as their name suggests, they are stable under addition, i.e. a sum of stable random variables is also a random variable of the same type. However, nonnormal stable distributions have more probability mass in the tail areas than the normal. In fact, the nonnormal stable distributions are so fat-tailed that their variance and all higher moments are infinite. Closed form expressions for the density functions of stable random variables are available for only the cases of normal and Cauchy. If a random variablehas a density function given by: ()= (;)=1 2 +() 2 whereand0thenis defined to have aCauchydistribu- tion. Notice that for this random variable even the mean is infinite.
Normal or Gaussian:
We say thatv[
2 ]then ¡| 2
¢=1
2 2 ()2 22
()= ()= 2 The distribution is symmetric about, it is also unimodal and positive everywhere.
Notice
=v[01] is the standard normal distribution.
42Examples of Parametric Univariate Distributions
Lognormal Distribution
Letbe a positive random variable, and let a new random variablebe defined as=log.Ifhas a normal distribution, thenis said to have a lognormal distribution. The density function ofa lognormal distribution is given by (; 2 )=1 2 2 (log)2 22
0 whereand 2 are parameters such thatand 2
0.Wehaven
[]= + 1 2 2 []= 2+2 2 2+ 2
Notice that ifis lognormally distributed then
[log]=[log]= 2
Gamma-
2 (|)=1 () 1 00 is shape parameter,is a scale parameter. Here()=R 0 1 is the
Gamma function,()=!.The
2 is when=,and=1. Notice that we can approximate the Poisson and Binomial functions by the normal, in the sense that if a random variableis distributed as Poisson with parameter,then is distributed approximately as standard normal. On the other hand if()then (1) (01). The standard normal is an important distribution for another reason, as well. Assume that we have a sample of n independent random variables, 1 2 ,which are coming from the same distribution with meanand variance 2 ,thenwehave the following: 1 X =1 v(01) This is the well knownCentral Limit Theoremfor independent observa- tions.
Multivariate Random Variables43
4.1 Multivariate Random Variables
We now consider the extension to multiple r.v., i.e., =( 1 2 )R
The joint pmf,
(), is a function with ()=X ()
The joint pdf,
(), is a function with ()=Z () This is a multivariate integral, and in general dicult to compute. Ifis a rectangle =[ 1 1 ]××[ ],then Z ()= Z 1 Z 1 () 1
The joint c.d.f. is defined similarly
()=X 1 1 ( 1 2 ) ()=( 1 1 )= 1 Z Z ( 1 2 ) 1 The multivariate c.d.f. has similar coordinate-wise properties to a univariate c.d.f.
For continuously dierentiable c.d.f.'s
()= () 1 2
4.1.1 Conditional Distributions and Independence
We defined conditional probability(|)=()()for events with()6=
0.Wenowwanttodefine conditional distributions of|. In the discrete case there
is no problem | (|)=(=|=)=() ()
44Examples of Parametric Univariate Distributions
when the event{=}has nonzero probability. Likewise we can define | (|)=(|=)=P () ()
Note that
| (|)is a density function and | (|)is a c.d.f. 1) | (|)0for all 2) P | (|)= P () () = () () =1 In the continuous case, it appears a bit anomalous to talk about the( |=),since{=}itself has zero probability of occurring. Still, we define the conditional density function | (|)=() () in terms of the joint and marginal densities. It turns out that | (|)has the properties of p.d.f. 1) | (|)0 2) R | (|)= R () i () = () () =1 We can define Expectations within the conditional distribution (|=)=Z | (|)=R () R () and higher moments of the conditional distribution
4.1.2 Independence
We say thatandare independent (denoted by)if
()=()() for all events,, in the relevant sigma-algebras. This is equivalent to the cdf's version which is simpler to state and apply. ()=()()
Multivariate Random Variables45
In fact, we also work with the equivalent density version ()=()() | (|)=() | (|)=()
If,then()()for any measurable functions,and.
We can generalise the notion of independence to multiple random variables.
Thus,,andare mutually independent if:
()=()()() ()=()() ()=()() ()=()() for all
4.1.3 Examples of Multivariate Distributions
Multivariate Normal
We say that(
1 2 )v ()when (|)=1 (2) 2 [det()] 12 expµ 1 2() 1 ()¶ whereis a×covariance matrix = 11 12 1 ...... ...... anddet()is the determinant of.
46Examples of Parametric Univariate Distributions
Theorem 8(a) Ifv
()then v( )(this is shown by inte- gration of the joint density with respect to the other variables). (b) The conditional distributions=( 1 2 )are Normal too 1 | 2 ( 1 | 2 )v¡ 1 | 2 1 | 2 ¢ where 1 | 2 = 1 + 12 122
( 2 2 ) 1 | 2 = 11 12 122
21
(c) Idiagonal then 1 2 are mutually independent. In this case det()= 11 22
1 2() 1 ()=1 2 X =1 ¡ ¢ 2 so that (|)= Y =1 1 p 2 expà 1 2¡ ¢ 2 !
4.1.4 More on Conditional Distributions
We now consider the relationship between two, or more, r.v. when they are not inde- pendent. In this case, conditional density | and c.d.f. | is in general varying with the conditioning point. Likewise for conditional mean(|), conditional median(|), conditional variance(|), conditional cf¡ |¢,andother functionals, all of which characterize the relationship betweenand.Notethat this is a directional concept, unlike covariance, and so for example(|)can be very dierent from(|).
Regression Models:
We start with random variable(). We can write for any such random variable = () z}|{(|)| {z} + z}|{(|)| {z} rand
Multivariate Random Variables47
By constructionsatisfies(|)=0,butis not necessarily independent of.
For example,(|)=((|)|)=(|)=
2 ()can be expected to vary withas much as()=(|)A convenient and popular simplification is to assume that (|)=+ (|)= 2 For example, in the bivariate normal distribution|has (|)= + ( ) (|)= 2 ¡1 2 ¢ and in fact. We have the following result about conditional expectations
Theorem 9(1)()=[(|)]
(2)(|)minimizes£(()) 2
¤over all measurable functions(·)
(3)()=[(|)] +[(|)]
Proof.(1) Write
()= | (|) ()then we have()=R ()=R¡R ()¢=R¡R | (|) ()¢= =R¡R | (|)¢ ()=R[(|=] ()=((|)) (2)£(()) 2
¤=£[(|)+(|)()]
2 ¤ =[(|)] 2 +2[[(|)][(|)()]]+[(|)()] 2 as now((|)) =£((|)) 2
¤,and(()) =((|)())we
get that£(()) 2
¤=[(|)]
2 +[(|)()] 2 [(|)] 2 . (3)()=[()] 2 =[(|)] 2 +[(|)()] 2 +2[[(|)][(|)()]]
Thefirst term is[(|)]
2 ={£[(|)] 2 |¤}=[(|)]
The second term is[(|)()]
2 =[(|)] The third term is zero as=(|)is such that(|)=0,and (|)()is measurable with respect to.
48Examples of Parametric Univariate Distributions
Covariance
()=[()][()] =()()()
Note that iforis a constant then()=0.Also
(++)=() An alternative measure of association is given by thecorrelation coecient =()
Note that
++ =()×()×
If(|)==()almost surely, then()=0Also ifandare
independent r.v. then()=0 Both the covariance and the correlation of random variablesandare measures of a linear relationship ofandin the following sense.[]will be positive when( )and( )tend to have the same sign with high probability, and[]will be negative when( )and( )tend to have opposite signs with high probability. The actual magnitude of the[]does not much meaning of how strong the linear relationship betweenandis. This is because the variability ofandis also important. The correlation coecient does nothavethisproblem,aswedividethecovariancebytheproductofthestandard deviations. Furthermore, the correlation is unitless and11. The properties are very useful for evaluating theexpected returnandstan- dard deviationof aportfolio. Assume and are the returns on assetsand, and their variances are 2 and 2 , respectively. Assume that we form a portfolio of the two assets with weights and , respectively. If the correlation of the returns of these assets is,find the expected return and standard deviation of the portfolio.
Inequalities49
If is the return of the portfolio then = + . The expected portfolio return is[ ]= [ ]+ [ ]. The variance of the portfolio is [ ]=[ + ]=[( + ) 2 ]([ + ]) 2 = = 2 [ 2 ]+ 2 [ 2 ]+2 [ ] 2 ([ ]) 2 2 ([ ]) 2 2 [ ][ ]= = 2 {[ 2 ]([ ]) 2 }+ 2 {[ 2 ]([ ]) 2 }+2 {[ ][ ][ ]} = 2 [ ]+ 2 [ ]+2 [ ]or= 2 2 + 2 2 +2
In a vector format we have:
[ ]=³ ´ [ ] [ ] and [ ]=³ ´ 2 2
From the above example we can see that[+]=
2 []+ 2 []+
2[]for random variablesandand constantsand. Infactwecan
generalize the formula above for several random variables 1 2 and constants 1 2 3 i.e.[ 1 1 + 2 2 + ]= P =1 2 [ ]+2 P [ ]
4.2 Inequalities
This section gives some inequalities that are useful in establishing a variety of prob- abilistic results.
4.2.1 Markov
Letbe a random variable and consider a function()such that()0for all
R. Assume that[()]exists. Then
[()] 1 [()]0
Proof:
Assume that Y is continuous random variable (the discrete case follows anal-
50Examples of Parametric Univariate Distributions
ogously) with p.d.f.().Define 1 ={|()}and 2 ={|()}.Then [()] =Z 1 ()()+Z 2 ()() Z 1 ()()Z 1 ()=[()] ¥
4.2.2 Chebychev's Inequality
[|()|]() 2 or alternatively h |()|p ()i 1 2
Proof:
To prove the above, assume that()=0and compare1(||)with 2 2 .
Clearly1(||)
2 2 and it follows that[1(||)] ( 2 ) 2 [||] () 2 . Alternatively, apply Markov's inequality by setting()=[()] 2 and = 2 ().¥
4.2.3 Minkowski
Letandbe random variables such that[(||
)]and[(|| )]for some1.Then [(|+| )] 1 [(|| )] 1 +[(|| )] 1
For=1we have the triangular inequality
4.2.4 Triangle
|+|||+||
4.2.5 Cauchy-Schwarz
2 ()() 2 () 2 (P ) 2 ¡P 2
¢¡P
2 ¢
Inequalities51
Proof:
Let0()=£()
2 ¤= 2 ( 2 )+( 2 )2(). Then the function()is a quadratic function inwhich is increasing as±.Ithasa unique minimum at ()=02( 2 )2()=0= () ( 2 ) . Hence 0³ () ( 2 ) ´ 2 ()() 2 () 2 .¥
4.2.6 Hölder's Inequality
For any,satisfying
1 + 1 =1we have ||(|| ) 1 (|| ) 1 In fact the Cauchy-Schwarz inequality corresponds for==2.
4.2.7 Jensen Inequality
Letbe a random variable with mean[],andlet()be a convex function. Then [()]([]) Now a continuous function()with domain and counterdomain the real line is called convexif for any 0 on the real line, there exist a line which goes through the point ( 0 ( 0 ))and lies on or under the graph of the function().Alsoif ( 0 )0 then()is convex.
52Examples of Parametric Univariate Distributions
Part II
Statistical Inference
53
Chapter 5
SAMPLING THEORY
To proceed we shall recall the following definitions. Let 1 2 berandom variables all definedonthesameprobability space(A[]).Thejoint cumulative distribution functionof 1 2 , denoted by 1 2 (•••),isdefined as 1 2 ( 1 2 )=[ 1 1 ; 2 2 ;; ] for all( 1 2 ). Let 1 2 bediscrete random variables, then thejoint discrete density functionof these, denoted by 1 2 (•••),isdefined to be 1 2 ( 1 2 )=[ 1 = 1 ; 2 = 2 ;; = ] for( 1 2 ),avalueof( 1 2 )and is 0 otherwise. Let 1 2 becontinuous random variables, then thejoint contin- uous density functionof these, denoted by 1 2 (•••),isdefined to be a function such that 1 2 ( 1 2 )= Z 1 Z 1 2 ( 1 2 ) 1 for all( 1 2 ). The totality of elements which are under discussion and about which infor- mation is desired will be called thetarget population. The statistical problem is
56Sampling Theory
tofind out something about a certain target population. It is generally impossible or impractical to examine the entire population, but one may examine a part of it (a sample from it) and, on the basis of this limited investigation, make inferences regarding the entire target population. The problem immediately arises as to how the sample of the population should be selected. Of practical importance is the case of a simple random sample, usually called a random sample, which can be defined as follows:
Let the random variables
1 2 have a joint density 1 2 ( 1 2 ) that factors as follows: 1 2 ( 1 2 )=( 1 )( 2 )( ) where()is the common density of each .Then 1 2 is defined to be a random sampleof sizefrom a population with density().Notethatidentical distribution can be weakened - could have dierent population for each-reflecting heterogeneous individuals. Also, in time series we might want to allow dependence, i.e., and are dependent. When we are dealing withfinite population, sampling without replacement causes some heterogeneity since if 1 = 1 , then the distribution of 2 must be aected.
5.1 Sample Statistics
Asample statisticis a function of observable random variables, which is itself an observable random variable, which does not contain any unknown parameters, i.e. a sample statistic is any quantity we can write as a measurable function,( 1 ).
For example, let
1 2 be a random sample from the density(). Then the sample moment, denoted by ,isdefined as: =1 X =1
Means and Variances57
In particular, if=1, we get the sample mean, which is usually denoted byor ; that is: =1 X =1
Also the
sample central moment (about ), denoted by ,isdefined as: =1 X =1 ¡ ¢ In particular, if=2, we get the sample variance, and the sample standard deviation 2 =1 X =1 ¡ ¢ 2 = 2 or maybe another sample statistic for the variance, 2 =1 1 X =1 ¡ ¢ 2
We can also get the sample Median,
={ 1 }= () =21 1 2 £ () + (+1)
¤ =2
the empirical cumulative distribution function ()=1 X =1 1( ) ()=1 X =1 =1 X =1 sin( )+1 X =1 cos( ) These are analogous of corresponding population characteristics and will be shown to be similar to them whenis large. We calculate the properties of these variables: (1) Exact properties; (2) Asymptotic.
5.2 Means and Variances
We can prove the following theorems:
58Sampling Theory
Theorem 10Let
1 2 be a random sample from the density().The expected value of the sample moment is equal to the population moment, i.e. the sample moment is an unbiased estimator of the population moment (Proof omitted).
Theorem 11Let
1 2 be a random sample from a density(),andlet = 1 P =1 be the sample mean. Then [ ]=[ ]=1 2 whereand 2 are the mean and variance of(), respectively. Notice that this is true for any distribution(), provided that is not infinite. Proof [ ]=[ 1 P =1 ]= 1 P =1 [ ]= 1 P =1 = 1 =.Also [ ]=[ 1 P =1 ]= 1 2 P =1 [ ]= 1 2 P =1 2 = 1 2 2 = 1 2
Theorem 12Let
1 2 be a random sample from a density(),andlet 2 defined as above. Then [ 2 ]= 2 [ 2 ]=1 µ 4 3 1 4 ¶ where 2 and 4 are the variance and the4 central moment of(), respectively. Notice that this is true for any distribution(), provided that 4 is not infinite. Proof We shall provefirst the following identity, which will be used latter: P =1 ( ) 2 = P =1 ¡ ¢ 2 +¡ ¢ 2 P( ) 2 =P¡ + ¢ 2 =P£¡
¢+¡
¢¤
2 = =
Ph¡
¢ 2 +2¡
¢¡
¢+¡
¢ 2 i = = P¡ ¢ 2 +2¡
¢P¡
¢+¡
¢ 2 =
Sampling from the Normal Distribution59
=P¡ ¢ 2 +¡ ¢ 2
Using the above identity we obtain:
[ 2 ]= 1 1 P =1 ( ) 2 ¸ = 1 1 P =1 ( ) 2 ¡ ¢ 2 ¸ = = 1 1 P =1 ( ) 2 ¡ ¢ 2 ¸ = 1 1 P =1 2 ¡
¢¸
= = 1 1 £ 2 1 2 ¤= 2
The derivation of the variance of
2 is omitted.
Theorem 13Let
1 be a random sample from a population with mean, variance 2 , skewness 3 ,andkurtosis 4 .Then, (3)( ()) =()and( ()) =()(1()) (4) Characteristic Function of , ()=[ ()] . Proof ( ()) =µ 1 P =1 1( )¶ =(1( )) =()Also( ()) = [ ()()] 2 =½ 1 P =1 1[ ]()¾ 2 = =( 1 2 P =1 {1[ ]()} 2 + 1 2 P 6= {1[ ]()}{1[ ]()}) = 1
©{1[
]()} 2 ª= 1 {{1[ ] 2 ()}}= 1 2 ()[1()]
5.3 Sampling from the Normal Distribution
Theorem 14Let denote
the sample mean of a random sample of sizefrom a normal distribution with meanand variance 2 . Then (1)v( 2 ) (2) and 2 are independent. (3) (1) 2 2 v 21
(4) v v 1 Proof
60Sampling Theory
(1) From a Theorem above we have that ()=[ ()] Now ()=exp¡ 1 2 2 2
¢. Hence
()=h exp³ q 1 2 2 ¡ q ¢ 2 ´i = exp³ 1 2 ³ 2 ´ 2 ´ ,whichistheof a normal distribution with meanand variance 2 . (2) For=2we have that if 1 v(01)and 2 v(01)then= 1 + 2 2 and 2 = ( 1 2 ) 2 4 .Define 1 = 1 + 2 2 and 2 = 1 2 2 .Then 1 and 2 are uncorrelated and by normality independent.
5.3.1 The Gamma Function
Thegamma functionis defined as:
()= Z 0 1 0
Notice that(+1)=(),as
(+1)= Z 0 = Z 0 =
¯¯
0 + Z 0 1 =() and ifis an integer then(+1)=!.Alsoifis again an integer then(+ 1 2 )=
135(21)
2 . Finally( 1 2 )=.
Recall that ifis a random variable with density
()=1 (2)µ
12¶
2 2 1 1 2 0 where()is the gamma function, thenis defined to have achi-square distrib- ution with k degrees of freedom.
Notice thatis distributed as above then:
[]=[]=2
We can prove the following theorem
Sampling from the Normal Distribution61
Theorem 15If the random variables
=12are normally and indepen- dently distributed with means and variances 2 then = X =1 µ ¶ 2 has a chi-square distribution withdegrees of freedom. Proof omitted.
Furthermore,
Theorem 16If the random variables
=12are normally and indepen- dently distributed with meanand variance 2 ,andlet 2 = 1 1 P =1 ( ) 2 then =(1) 2 2 v 21
where 21
is the chi-square distribution with1degrees of freedom. Proof omitted.
5.3.2 The F Distribution
Ifis a random variable with density
()=[(+)2] (2)(2)³ q´ 2 2 1 [1 + ()] (+)2 0 where()is the gamma function, thenis defined to have aF distribution with m and n degrees of freedom.
Notice that ifis distributed as above then:
[]= q2 []=2 2 (+2) (2) 2 (4) Theorem 17If the random variablesandare independently distributed as chi- square withanddegrees of freedom, respectively i.e.v 2 andv 2 independently, then Y @ q=v where is thedistribution withdegrees of freedom. Proof omitted.
62Sampling Theory
5.3.3 The Student-t Distribution
Ifis a random variable with density
()=[(+1)2] (2)1 1 [1 + 2 ] (+1)2 where()is the gamma function, thenis defined to have at distribution with k degrees of freedom.
Notice that ifis distributed as above then:
[]=0 []= n2 Theorem 18If the random variablesandare independently distributed as stan- dard normal and chi-square with, respectively i.e.v((01)andv 2 inde- pendently, then p =v where is thedistribution withdegrees of freedom. Proof omitted. The above Theorems are very useful especially to get the distribution of various tests and construct confidence intervals.
Chapter 6
POINT AND INTERVAL ESTIMATION
The problem of estimation is defined as follows. Assume that some characteristic of the elements in a population can be represented by a random variablewhose den- sity is (;)=(;), where the form of the density is assumed known except that it contains an unknown parameter(ifwere known, the density function would be completely specified, and there would be no need to make inferences about it.
Further assume that the values
1 2 of a random sample 1 2 from (;)can be observed. On the basis of the observed sample values 1 2 it is desired to estimate the value of the unknown parameteror the value of some function, say(), of the unknown parameter. The estimation can be made in two ways. Thefirst, calledpoint estimation, is to let the value of some statistic, say ( 1 2 ), represent or estimate, the unknown(). Such a statistic is called thepoint estimator. The second, calledinterval estimation,istodefine two statistics, say 1 ( 1 2 )and 2 ( 1 2 ),where 1 ( 1 2 ) 2 ( 1 2 ),sothat( 1 ( 1 2 ) 2 ( 1 2 ))constitutes anin- tervalfor which the probability can be determined that it contains the unknown ().
6.1 Parametric Point Estimation
The point estimation admits two problems. Thefirstistodevisesomemeansofob- taining a statistic to use as an estimator. The second, to select criteria and techniques
64Point and Interval Estimation
to define andfind a "best" estimator among many possible estimators.
6.1.1Methods of Finding Estimators
Any statistic (known function of observable random variables that is itself a random variable) whose values are used to estimate(),where()is some function of the parameter,isdefined to be anestimatorof(). Notice that for specific values of the realized random sample the estimator takes a specific value calledestimate.
6.1.2Method of Moments
Let(; 1 2 )beadensityofarandomvariablewhich hasparameters 1 2 . As before let denote the moment i.e.=[ ]. In general will be a known function of theparameters 1 2 .Denotethisbywrit- ing = ( 1 2 ).Let 1 2 be a random sample from the density (; 1 2 ), and, as before, let bethe sample moment, i.e. = 1 P =1 . Then equating sample moments to population ones we getequations withun- knowns, i.e. = ( 1 2 ) =12
Let the solution to these equations be
b 1 b 2 b . We say that theseestimators are the estimators of 1 2 obtained by themethod of moments.
Statistics And Probability Documents PDF, PPT , Doc