Chi-Squared Tests

Chi Square Analysis - The Open University

Chi Square Analysis - The Open University www open ac uk/socialsciences/spsstutorial/files/tutorials/chi-square pdf the same as the expected frequencies (except for chance variation) observed frequency-distribution to a theoretical expected frequency-distribution

SPSS: Expected frequencies, chi-squared test In-depth example

SPSS: Expected frequencies, chi-squared test In-depth example www sfu ca/~jackd/Stat203_2011/Wk12_2_Full pdf Most important things to know: - How to get the expected frequency from a particular cell - Chi-squared is a measure of how far the observed frequencies are

Chi-Square

Chi-Square www d umn edu/~rlloyd/MySite/Stats/Ch 2013 pdf Step 1: Arrange data into a frequency/contingency table Step 2: Compute Expected Frequencies Based Upon Null Hypothesis

?2 Test for Frequencies

?2 Test for Frequencies courses washington edu/psy315/tutorials/chi_2_test_frequencies_tutorial pdf 17 jan 2021 Like all statistical tests, the ?2test involves calculating a statistic that measures how far our observations are from those expected under the

2 X 2 Contingency Chi-square

2 X 2 Contingency Chi-square web pdx edu/~newsomj/uvclass/ho_chisq pdf examine the expected vs the observed frequencies The computation is quite similar, except that the estimate of the expected frequency is a little harder

Chi-Square Tests and the F-Distribution Goodness of Fit

Chi-Square Tests and the F-Distribution Goodness of Fit www3 govst edu/kriordan/files/mvcc/math139/ pdf /lfstat3e_ppt_10 pdf To calculate the test statistic for the chi-square goodness-of-fit test, the observed frequencies and the expected frequencies are used The observed frequency

14 Chi-squared goodness of fit test 1 Introduction 2 Example

1 4 Chi-squared goodness of fit test 1 Introduction 2 Example www lboro ac uk/media/media/schoolanddepartments/mlsc/downloads/1_4_gofit pdf estimated from the (sample) data used to generate the hypothesised distribution From these we can calculate the expected frequencies

Chi-Squared Tests

Chi-Squared Tests www thphys nuim ie/Notes/EE304/Notes/LEC14/ChiSlide pdf If the 6-sided die is fair, then the expected frequency is on the null hypothesis and then compare the expected frequencies with the actual frequencies

Week 6: Frequency data and proportions - UBC Zoology

Week 6: Frequency data and proportions - UBC Zoology www zoology ubc ca/~whitlock/bio300/labs/LabManual/Week 2006 20-- 20FREQUENCY 20DATA pdf categorical variable to the frequencies predicted by a null hypothesis than 25 of the expected frequencies are less than 5 and none is less than 1 )

Ex 8- Chi-squared Mapping Exercisepdf - webspaceshipedu

Ex 8- Chi-squared Mapping Exercise pdf - webspace ship edu webspace ship edu/pgmarr/Geo532/Ex 208- 20Chi-squared 20Mapping 20Exercise pdf difference between the observed and expected frequencies ij is the expected frequency, R is the row, C is the column, and n total observations

100438_3ChiSlide.pdf

Chi-Squared Tests

Semester 1

Chi-Squared Tests

Goodness of Fit

Up to now, we have tested hypotheses concerning the values of population parameters such as the population mean or proportion.We have not considered testing hypotheses about the form of a population's distribution.We next consider the problem of determining whether or not a population follows a particular distribution.

Chi-Squared Tests

Goodness of Fit

For example, we may be interested in determining whether the number of emails arriving per minute at a server follows a Poisson distribution or not.Similarly, we may wish to test if the lengths of components

from an automated process follow a normal distribution.Another similar question is whether a 6-sided die is fair or not.

Chi-Squared Tests

Goodness of Fit

The general procedure for testing hypotheses on the distribution of a population is as follows. (i) The null hyp othesisH0is that some distribution describes the population.(ii)W echo osea sample of siz enfrom the population. This might involve recording the results ofnrolls of the die or recording

the number of mails arriving innminute-long intervals.(iii)The observations in our sample a regroup edinto k\bins" or

\classes" and we record the number of observations that fall into each bin or thefrequencyof each bin.Oirepresents the

number of observations orobserved frequencyfor theithbin.(iv)Under the null hyp othesis,w ecan calculate the expected

frequency E ifor each bin.Chi-Squared Tests

Goodness of Fit

(v)

The test statisticwe use is

20=kX i=1(EiOi)2E i: If the null hypothesis is true, then20has approximately a chi-squared distribution withkp1 degrees of freedom. Herepis the number of parameters of the distribution that

we have to estimate with our sample data.(vi)W ereject H0at the signicance levelif the value of20calculated with our sample data exceeds the critical value

2kp1;which we obtain from a table of chi-square critical

values.

Chi-Squared Tests

Goodness of Fit

For example:

For testing the fairness of a die, we would use 6 bins for the numbers 1 to 6 andOiwould count the number of times the numbericame up in thenrolls.For the email server example, the bins might represent 0 emails, 1 email, 2 emails, 3 emails and4 emails. In this case, we would have 5 bins in total, and O

1would count the number of minutes in which we received 0

emails, O

2the number of minutes in which we received 1 email,

3the number of minutes in which we received 2 emails and

so on.

Chi-Squared Tests

Goodness of Fit

If the 6-sided die is fair, then the expected frequency is E i=n6 for each bin.If the number of emails arriving at the server per minute follows a Poisson distribution with mean, the expected number of minutes in which no emails arrive would be e n: The expected number of minutes in which 1 email arrives would bee11! n and so on.

Chi-Squared Tests

Goodness of Fit

The key point is that we can compute expected frequencies based on the null hypothesis and then compare the expected frequencies with the actual frequencies observed in a sample. When the deviation between the expected frequencies and the observed frequencies istoo largewe reject the null hypothesis concerning the population.To determine when the dierence between observed and expected frequencies is too large, we use a special distribution known as the chi-squared di stribution.

Chi-Squared Tests

Poisson Goodness of Fit

Example

The number of emails arriving at a server per minute is claimed to follow a Poisson distribution. To test this claim, the number of emails arriving in 70 randomly chosen 1-minute intervals is recorded. The table below summarises the results.

Number of emails01234Frequency132223120

Test the hypothesis that the number of emails per minute follows a Poisson distribution? Use a signicance level of= 0:05.Chi-Squared Tests

Poisson Goodness of Fit

To calculate the expected frequencies, we need the Poisson parameter. This is simply the mean number of emails per minute. We need to estimate this from the sample data: =13(0) + 22(1) + 23(2) + 12(3)70 = 1:49:Our Null Hypothesis isH0: Number of emails per minute has a Poisson Distribution with= 1:49.H

1: Number of emails per minute does not have a a Poisson

Distribution with= 1:49.Signicance Level:= 0:05.Test Statistic: We treat the last two bins as one (as no

minutes contained 4 or more calls) so the number of bins is k= 4. 20=4X i=1(EiOi)2E i:

Chi-Squared Tests

Poisson Goodness of Fit

We rejectH0if our sample data gives a value of

20> 22;0:05= 5:99. We have lost two degrees of freedom

because we have to estimate the parameterfrom sample data.To do the calculation, we require:

Number of emailsObserved Freq.Expected Freq.

01315.78

12223.51

22317.5

31213.2

Chi-Squared Tests

Poisson Goodness of Fit

The actual value of20is then:

(15:7813)215:78+(23:5122)223:51+(17:5123)217:51+(13:212)213:2 which is equal to 2.417.We cannot reject the null hypothesis at the 5% level of signicance.

Chi-Squared Tests

Binomial Goodness of Fit

It is also possible to perform a goodness of t test for

distributions other than the Poisson distribution.The approach is essentially the same - all that changes is the

distribution used to calculate the expected frequencies.We next consider an example based on the Binomial

distribution.

Chi-Squared Tests

Binomial Goodness of Fit

Example

Bits are sent over a communications channel in packets of 8. In order to characterise the performance of this channel, 80 packets are sent over the channel and the number of corrupted bits in each packet is recorded. The results of this experiment are recorded below. Number of Corrupt Bits01234Number of Packets35311040 Test the hypothesis that the number of corrupted bits in a packet sent over this channel follows a binomial distribution. Use a signicance level of= 0:025Chi-Squared Tests

Binomial Goodness of Fit

To calculate the expected frequencies, we need the binomial parameterp.We need to estimate this from the sample data. Out of the 640 bits sent over the channel, 63 were corrupt.

So our estimate ofpis63640

= 0:098Chi-Squared Tests

Binomial Goodness of Fit

0: Population is binomial withp= 0:098.H

1: Population is not binomial.Signicance Level:= 0:025.Test Statistic: We treat the last two bins as one (as no

packets contain 4 or more corrupt bits) so the number of bins isk= 4. 20=4X i=1(EiOi)2E i:We rejectH0if our sample data gives a value of

20> 22;0:025= 7:378. We have lost two degrees of freedom

because we have to estimate the parameterpfrom sample data.

Chi-Squared Tests

Binomial Goodness of Fit

To do the calculation, we require:

Number of Corrupt BitsObserved Freq.Expected Freq.

03535.04

13130.48

21011.6

342.88

The actual value of20is then:

(35:0435)235:04+(30:4831)230:48+(11:610)211:6+(2:884)22:88= 0:665:We cannot reject the null hypothesis at the 1% level of

signicance.

Chi-Squared Tests

Normal Goodness of Fit

The nal example of goodness of t that we shall consider is for the Normal distribution.For this case, the situation is a little more complicated as the distribution is continuous. This means that we need to be more careful in selecting the bins.In practice, it is usual to choose bins so that the expected frequency for each bin is the same.We shall see how to do this in an example below.

Chi-Squared Tests

Normal Goodness of Fit

Example

A text processing tool can be downloaded from a particular webserver. The administrator of the server wishes to test if the download times are adequately described by a normal distribution. A random sample of 80 users is selected and their download times recorded. The mean and standard deviation of the download times (in seconds) for the sample are 20.2 and 2.1 respectively.

Chi-Squared Tests

Normal Goodness of Fit

Suppose we wish to use 8 bins.

We rst nd the intervals that divide the standard normal distribution into 8 equal parts.From the table of standard normal probabilities we can see that these intervals are: (1;1:15];(1:15;0:675];(0:675;0:32];(0:32;0]

and their mirror images on the other side of 0.This allows us to construct the bins in which to group our

data.

Chi-Squared Tests

Normal Goodness of Fit

The rst bin will bex20:21:15(2:1) = 17:785, the second bin will be 17:78580 users in the sample as well as the expected frequencies for each bin.

Download Time (x)Observed Freq.Expected Freq

x17:785810

17:785
18:7825
19:528
20:2
20:872
21:6175
22:615
Chi-Squared Tests

Normal Goodness of Fit
We can now test the data for normality at a 5% level of signicance following the same procedure as before.1H
0: The form of the distribution is normal.2H

1: The form of the distribution is non-normal.3Signicance Level:= 0:05.4Test Statistic
20=8X i=1(EiOi)2E i:5We rejectH0if our sample data gives a value of
20> 25;0:05= 11:07. The number of degrees of freedom is

821 = 5 because we estimated two parameters from the
data.6The actual value of20for our sample is 3. As this is not greater than the critical value of 11:07 we cannot reject the null hypothesis at the 5% level of signicance.
Chi-Squared Tests

Contingency Tables
Another use of the chi-square distribution is to assess the
independence of two dierent ways of classifying a population.For example, we could classify drivers according to age and
according to the insurance premium they pay and test whether these classications are independent.Another example would be to classify computer users by the number of times their computer crashes per week and also by the operating system they use.
Chi-Squared Tests

Contingency Tables
If the rst way of classifying the population hasrlevels (r dierent age categories for the drivers) and the second hasc levels (cdierent categories of insurance premiums), the observed frequencies can be recorded in an rccontingency table with rrows andccolumns.A sample ofnobservations is selected. For 1irand
1jc, we letOijdenote the frequency observed for leveli
in the rst classication and leveljin the second classication.t idenotes the total number of observations in categoryiin the rst way of classifying the population andsjdenotes the total number of observations in categoryjin the second way of classifying the population.
Chi-Squared Tests

Contingency Tables
If the two methods of classication are independent then the expected frequency of \cell" (i;j) , denotedEijis t isjn :Similar to goodness of t tests, we use the test statistic 20=rX i=1c X j=1(EijOij)2E ij which has a chi-square distribution with (r1)(c1) degrees of freedom if the hypothesis of independence is true.
Chi-Squared Tests

Contingency Tables

Example
A car rental company wishes to test if the age of a car rented to a customer and the customer's level of satisfaction are independent of each other. When a customer returns a car they rate their level of satisfaction as one ofdissatised, no opinion, satised, very satised. The company only rents out cars that are two years old or less. A random sample of 60 customers is selected and the results of the sample are recorded in the table below.
Chi-Squared Tests

Contingency Tables

Satisfaction Level
Age of CarDissatisedNo OpinionSatisedVery Satised
New46137

1 Year31061

2 Years3340
Test the hypothesis that the level of satisfaction is independent of the age of the car at a 5% level of signicance.
Chi-Squared Tests

Contingency Tables
1H
0: The two methods of classication are independent.2H

1: The two methods of classication are not independent.3Signicance Level:= 0:05.4Test Statistic:
20=3X i=14 X j=1(EijOij)2E ij:5RejectH0if the value of20is greater than26;0:05= 12:59:
We have 6 = (31)(41) degrees of freedom.6The value of20for our sample is 9:91.7As this is not greater than the critical value of 12:59, we
cannot rejectH0at the 5% level of signicance.Chi-Squared Tests

Politique de confidentialité -Privacy policy

Chi-Squared Tests

Chi-Squared Tests

Semester 1

Chi-Squared Tests

Goodness of Fit

Chi-Squared Tests

Goodness of Fit

Chi-Squared Tests

Goodness of Fit

Goodness of Fit

The test statisticwe use is

2kp1; which we obtain from a table of chi-square critical

Chi-Squared Tests

Goodness of Fit

For example:

1would count the number of minutes in which we received 0

2the number of minutes in which we received 1 email,

3the number of minutes in which we received 2 emails and

Chi-Squared Tests

Goodness of Fit

Chi-Squared Tests

Goodness of Fit

Chi-Squared Tests

Poisson Goodness of Fit

Example

Number of emails01234Frequency132223120

Poisson Goodness of Fit

1: Number of emails per minute does not have a a Poisson

Chi-Squared Tests

Poisson Goodness of Fit

We rejectH0if our sample data gives a value of

20> 22;0:05= 5:99. We have lost two degrees of freedom

Number of emailsObserved Freq.Expected Freq.

01315.78

12223.51

22317.5

Chi-Squared Tests

Poisson Goodness of Fit

The actual value of20is then:

Chi-Squared Tests

Binomial Goodness of Fit

Chi-Squared Tests

Binomial Goodness of Fit

Example

Binomial Goodness of Fit

So our estimate ofpis63640

Binomial Goodness of Fit

0: Population is binomial withp= 0:098.H

1: Population is not binomial.Signi cance Level: = 0:025.Test Statistic: We treat the last two bins as one (as no

20> 22;0:025= 7:378. We have lost two degrees of freedom

Chi-Squared Tests

Binomial Goodness of Fit

To do the calculation, we require:

03535.04

13130.48

21011.6

The actual value of20is then:

Chi-Squared Tests

Normal Goodness of Fit

Chi-Squared Tests

Normal Goodness of Fit

Example

Chi-Squared Tests

Normal Goodness of Fit

Suppose we wish to use 8 bins.

Chi-Squared Tests

Normal Goodness of Fit

Download Time (x)Observed Freq.Expected Freq

Chi-Squared Tests

Normal Goodness of Fit

0: The form of the distribution is normal.2H

1: The form of the distribution is non-normal.3Signi cance Level: = 0:05.4Test Statistic

20> 25;0:05= 11:07. The number of degrees of freedom is

821 = 5 because we estimated two parameters from the

Chi-Squared Tests

Contingency Tables

Chi-Squared Tests

Contingency Tables

1jc, we letOijdenote the frequency observed for leveli

Chi-Squared Tests

2kp1;which we obtain from a table of chi-square critical

Number of emails01234Frequency132223120

20> 22;0:05= 5:99. We have lost two degrees of freedom

The actual value of20is then:

1: Population is not binomial.Signicance Level:= 0:025.Test Statistic: We treat the last two bins as one (as no

20> 22;0:025= 7:378. We have lost two degrees of freedom

The actual value of20is then:

1: The form of the distribution is non-normal.3Signicance Level:= 0:05.4Test Statistic

20> 25;0:05= 11:07. The number of degrees of freedom is

1jc, we letOijdenote the frequency observed for leveli

0: The two methods of classication are independent.2H

1: The two methods of classication are not independent.3Signicance Level:= 0:05.4Test Statistic: