[PDF] Maths/stats support 11 t-test - WordPresscom



Previous PDF Next PDF







MATH 2P82 MATHEMATICAL STATISTICS (Lecture Notes)

7 Chapter 1 PROBABILITY REVIEW Basic Combinatorics Number of permutations of ndistinct objects: n Not all distinct, such as, for example aaabbc: 6



A LEVEL MATHS - STATISTICS REVISION NOTES

A LEVEL MATHS - STATISTICS REVISION NOTES PLANNING AND DATA COLLECTION • PROBLEM SPECIFICATION AND ANALYSIS What is the purpose of the investigation? What data is needed? How will the data be used? • DATA COLLECTION How will the data be collected? How will bias be avoided? What sample size is needed? • PROCESSING AND REPRESENTING





Mathematics & Statistics

Maths & Stats Society (MathSoc) Our student society enhances the friendly and vibrant community of the Department LU MathSoc is a free academic society for anyone to join - if you are looking for a hard-working and supportive community who also knows how to have fun, you’ve found the right place They hold social events every



Lecture17 IntrotoLassoRegression

Twolecturesago,Iintroducedtheridgeestimator: b = argmin b {jjy Xbjj2 2+ jjbjj2 Noticethatforany > 0 thereexistsans equaltojj b jj2 2 where: b = argmin b {jjy Xbjj2 2; s t jjbjj2 s



Maths/stats support 11 t-test - WordPresscom

Maths/stats support 11 t-test Student skills M0 11S Salters-Nuffield Advanced Biology, Harcourt Education Ltd 2006 ©University of York Science Education Group This sheet may have been altered from the original 1of 7 Height/metres mean height in Blue-town Blue-town data Note that we get a symmetrical graph (normal distribution) with the mean



AS Practice Paper E (Statistics & Mechanics) mark scheme

AS Practice Paper E (Statistics & Mechanics) mark scheme 3d Comparison of the two means For example, the mean distance for London is smaller than for Devon Sensible interpretation comparing a county to a city



[PDF] maths stmg bac

[PDF] Maths Suite géométrique terminale

[PDF] maths sur les fonction

[PDF] Maths sur les fonctions

[PDF] Maths sur les probabilités exercices

[PDF] maths sur puissances

[PDF] Maths sur Thalès pour demain

[PDF] maths svp

[PDF] maths table carrée , nappe ronde

[PDF] Maths Tableau

[PDF] maths tableur troisième

[PDF] Maths tarif

[PDF] maths taux de variation

[PDF] maths terminale es exercices corrigés

[PDF] maths terminale es fonction exponentielle

Guinness and the t-test

The t-test was invented by a chemist called William Gosset. Gosset worked for the Guinness Brewery in Dublin and was concerned with making Guinness that tasted the same every time. The key to achieving a consistent product lay in the standardisation of ingredients. He sampled the ingredients and measured various properties of the barley and hops and other stuff that goes into beer. He ended up with small sets of data that proved difficult to analyse because statisticians up to then (late nineteenth century), had only dealt with large samples. Gosset was a statistical genius though, so he devised the t-test to help him compare mean values of the various parameters he measured. The Brewery would not let its employees publish work done for the Guinness Company because they thought it would give an advantage to their competitors. Gosset therefore published his results under the pseudonym 'Student'. Only a few people knew who 'Student' was and speculation was rife among statisticians as to whom it might be. It was all very exciting and added further to the romance and sheer joie de vivreof their statistical world. Gosset was described by a fellow statistician as 'the Faraday of statistics' (Michael Faraday was one of the all-time great physicists).When to use a t-test t-tests compare the means of two sets of data. The data should be normally distributed, unmatched, interval data (see 'Deciding which statistical test to use' in Maths/stats support 9 What test should I use? (M0.09S)). It is ideal if you have 25 or more measurements in each set. If you have such big samples (25?in each set) you can use it on ordinal data and you don't have to worry about the data being normally distributed (unless they are severely skewed). Obviously you can 'compare' the means of two sets of data by subtracting one mean from the other to work out the difference. The t-test does that but it also takes into account the degree of overlap between the two sets of data and lets you say how certain you are in saying that the means are, or are not, significantly different from each another. Let us suppose that you live in Blue-town. Somewhat bizarrely you are extremely proud of the fact that you and your fellow townspeople are generally very tall (much taller than those vertically challenged nincompoops down the road in Red-town). You decide to celebrate the glorious height of the Blue-towners by measuring lots of them and plotting the graph you see in Figure 1.Maths/stats support 11 t-testStudent skills

M0.11SSalters-Nuffield Advanced Biology, Harcourt Education Ltd 2006. ©University of York Science Education Group.

This sheet may have been altered from the original. 1of 7

Height/metresmean height

in Blue-townBlue-town data

Note that we get a symmetrical

graph (normal distribution) with the mean value being found at the midpoint of the distribution. Number of peopleFigure 1Height of the Blue-towners. After spending a while gloating about Blue-town's enormous mean height, you decide to rub it in by visiting Red-town (probably in disguise) and measuring some of the shorties who live there. Once you have demonstrated their shortness your joy will be complete (you're quite sad really, aren't you?). You visit Red-town and measure as many people as you can get to agree to be measured and obtain the data displayed in Figure 2. It's truly appalling. Even the tallest person in Blue-town is shorter than the shortest person in Red-town. Your dubious heightist agenda has been shown up for the moronic conceit that it is. If anyone were to ask themselves the question: 'Is the mean height in Blue-town significantly different from the mean height in Red-town?', the answer would not be in doubt: Yes, the mean in Red-town is much bigger and there is not even any overlap between the two datasets. In an attempt to find somewhere (anywhere!) where mean height is less than Blue-town you visit Purple-town and get the results displayed in Figure 3. Now the answer to the question 'Is the mean height in Blue-town significantly different from the mean height in Purple-town?' is not nearly so obvious. Not only are the two means quite close together but there is also a considerable amount of overlap between the two distributions (lots of readings in common). The t-test allows you to answer the question 'are the means significantly different?'. It takes into account the degree of overlap in the two sets of data and allows you to say how certain you are in either accepting that there is no significant difference, or stating that there is a significant difference.

Student skills

M0.11S

Maths/stats support 11 t-test

Salters-Nuffield Advanced Biology, Harcourt Education Ltd 2006. ©University of York Science Education Group.

This sheet may have been altered from the original. 2of 7

Height/metres

mean height in Blue-townmean height in Red-town

Number of people

Blue-town data Red-town data

Figure 2Height survey data for Blue-towners and Red-towners.

Height/metres

mean height in Blue-townmean height in Purple-townmean height in Red-town

Number of people

Figure 3Height survey results for Blue-towners, Red-towners and Purple-towners.

How to do a t-test

An informal study of Irish pubs in London and Dublin revealed that in London it seemed to take less time to pour a pint of Guinness than it did in Dublin. In honour of the great man Gosset it was decided to conduct research as to whether the mean pouring times for a pint of Guinness are significantly different in Dublin and London. As with all of these kinds of tests you begin with a null hypothesis: There is no significant difference between the mean pouring time for a pint of Guinness in Dublin and London. Data were collected by highly trained professionals on the time taken to pour a pint of Guinness in 25 different pubs in each city (Table 1). A quick way to check if the data are normally distributed is to construct a tally chart (Table 2).

Student skills

M0.11S

Maths/stats support 11 t-test

Salters-Nuffield Advanced Biology, Harcourt Education Ltd 2006. ©University of York Science Education Group.

This sheet may have been altered from the original. 3of 7

Pint number Mean time to pour a

pint/minutes

Dublin London

1 8.4 3.0

2 10.3 11.0

3 10.6 10.2

4 14.1 9.3

5 13.0 9.3

6 9.4 6.9

7 9.2 6.7

8 11.6 6.0

9 11.2 5.8

10 10.0 7.2

11 12.0 6.8

12 13.1 8.2

13 12.3 8.1

14 9.1 8.0

15 9.7 9.4

16 12.2 10.1

17 12.4 4.2

18 11.4 4.3

19 11.1 5.6

20 11.0 5.7

21 12.7 7.4

22 13.0 7.3

23 11.2 8.4

24 10.2 7.9

25 10.8 7.6

Table 1Time taken in minutes to pour one pint of Guinness

in 25 different pubs at each of two locations.Table 2 Tally chart for 'time taken to pour one pint of

Guinness' data. It helps if you put the data into an Excel spreadsheet and sort each set.

Size class/minutes Dublin London

3.0-3.9 /

4.0Ð4.9 //

5.0Ð5.9 ///

6.0Ð6.9 ////

7.0Ð7.9 /////

8.0Ð8.9 / ////

9.0Ð9.9 //// ///

10.0Ð10.9 ///// //

11.0Ð11.9 ////// /

12.0Ð12.9 /////

13.0Ð13.9 ///

14.0Ð14.9 /

Next we calculate a value of the test statistic, t, and compare this calculated value with a value obtained from a table of critical values of t at a significance level that we choose (usually 5%).

We can then decide whether to accept or reject

H 0 . With a t-test if the calculated value is bigger than the critical value we reject the null hypothesis.

Calculating t

In order to calculate t, you need to know three things about each set of data:

The number of items in each dataset (n).

For both sets of data, Dublin and London, n ?25

•The mean (x?). Calculate the mean of each set of data using the formula: ?xx?? n

Where:??the sum of

x?a piece of data n?number of items of data.

Dublin:

?x?280 n?25 280
x?? 25
?11.2 minutes

London:

?x?184.4 n?25 184.4
x?? 25
? 7.4 minutes

The variance (s

2 The variance is a measure of the spread of the data either side of the mean. If there are a lot of values spread either side of the mean the data have a large variance. If the values are clustered around the mean with not much spread, the data have a small variance (see Figure 4). If you know how to work the statistical functions on your calculator or can use a spreadsheet you can save a lot of time and effort at this point. All you have to do is stick all the data in the memory (separately for each set) and pressing the appropriate buttons will give you the mean, the variance and the number of items in each dataset. The instructions will show you how to do this. If you've lost the instructions proceed as follows.

Student skills

M0.11S

Maths/stats support 11 t-test

Salters-Nuffield Advanced Biology, Harcourt Education Ltd 2006. ©University of York Science Education Group.

This sheet may have been altered from the original. 4of 7 greater spread ? larger variance little spread ? smaller variance mean valuemean value Figure 4Variance is a measure of spread of the data around the mean.

Calculate the variance (s

2 ) of each set of data using this formula: s 2

Work out ?x

2 by squaring each individual piece of data and adding them up (see Table 3).

Work out (?x)

2 by adding up each piece of data (you've already done this to calculate the mean) and squaring the answer.

Student skills

M0.11S

Maths/stats support 11 t-test

Salters-Nuffield Advanced Biology, Harcourt Education Ltd 2006. ©University of York Science Education Group.

This sheet may have been altered from the original. 5of 7

Table 3Calculating ?x

2 x 2 n n ? 1? (x) 2

Pint number Mean time to pour a pint/minutes

Dublin Dublin

2

London London

2

1 8.4 70.56 3.0 9.00

2 10.3 106.09 11.0 121.00

3 10.6 112.36 10.2 104.04

4 14.1 198.81 9.3 86.49

5 13.0 169.00 9.3 86.49

6 9.4 88.36 6.9 47.61

7 9.2 84.64 6.7 44.89

8 11.6 134.56 6.0 36.00

9 11.2 125.44 5.8 33.64

10 10.0 100.00 7.2 51.84

11 12.0 144.00 6.8 46.24

12 13.1 171.61 8.2 67.24

13 12.3 151.29 8.1 65.61

14 9.1 82.81 8.0 64.00

15 9.7 94.09 9.4 88.36

16 12.2 148.84 10.1 102.01

17 12.4 153.76 4.2 17.64

18 11.4 129.96 4.3 18.49

19 11.1 123.21 5.6 31.36

20 11.0 121.00 5.7 32.49

21 12.7 161.29 7.4 54.76

22 13.0 169.00 7.3 53.29

23 11.2 125.44 8.4 70.56

24 10.2 104.04 7.9 62.41

25 10.8 116.64 7.6 57.76

?280.0 3186.80 184.4 1453.22

Variance in Dublin:

s 2 ??2.12 minutes

Variance in London:

s 2 ??3.88 minutes We now have all the numbers we need to calculate the value of our test statistic, t. It's nearly always during the grinding out of all those squared values and in adding them all up that things go wrong. You can reduce the possibility of calculation errors by using a calculator or an Excel spreadsheet. Next, calculate the value of t using this formula: t ? where:x? 1 ?mean of one set of data x 2 ?mean of the other set of data s 12 ?variance of one set of data s 22
?variance of the other set of data t??7.76 That's the hard part over but now for the most important bit, what the t value tells us. The calculated t value is compared with a critical value obtained from a table of critical values of t. Table 3 is a simplified version of a table of critical values of t. It is important that you use the correct row, according to how many degrees of freedom your data have. The degrees of freedom is a way of taking into account the number of measurements collected; you work out the degrees of freedom using the formula n 1 ?n 2 ?2. Look down the left side of the table until you find the correct number of degrees of freedom. Then move along the row until you reach the critical value at p?0.05. In biology we usually work at a 5% significance level (p?0.05 probability). You will note that the table does not provide a critical value for 48 degrees of freedom, the value lies between 40 and 60 degrees of freedom. If this situation occurs, use the value below the one you need. In our example this is 40 degrees of freedom. The reason for this is that we are cautious, conservative t-testers (as opposed to wild, irresponsible, devil-may-care t-testers) and it's harder to beat the slightly bigger values of the 40 degrees of freedom than the smaller 60 degrees of freedom. We can therefore put more reliance on our result. For the Guinness data the degrees of freedom are 25 ?25 ?2 ?48, so the value from the table is 2.021 at the 5% significance level (p?0.05). With a t-test, if the calculated value is bigger than the critical value we reject the null hypothesis; if it is smaller than the critical value we accept the null hypothesis. Our calculated value of 7.76 is bigger (much bigger) than the critical value, 2.021, so we

Student skills

M0.11S

Maths/stats support 11 t-test

Salters-Nuffield Advanced Biology, Harcourt Education Ltd 2006. ©University of York Science Education Group.

This sheet may have been altered from the original. 6of 7

25 ? 1

3186.80?25(280.0)

2

25 ? 1

1453.22?25(184.4)

2 ?s n 1 s? x 1 x 2 n 2 2 12 2 ?2.12 25
3.88 25
?11.2 7.4 reject the null hypothesis and say that there is a significant difference between the two means at the 5% significance level (p?0.05). These results would occur by chance less than 5% of the time, so if we did the investigation a very large number of times we would expect to be correct in rejecting our null hypothesis some 95% of the time.

Student skills

M0.11S

Maths/stats support 11 t-test

Salters-Nuffield Advanced Biology, Harcourt Education Ltd 2006. ©University of York Science Education Group.

This sheet may have been altered from the original. 7of 7

Table 4Critical values of t.

Degrees of freedom Critical value of t at

5% significance level

(p?0.05)

10 2.228

11 2.201

12 2.179

13 2.160

14 2.145

15 2.132

16 2.120

17 2.110

18 2.101

19 2.093

20 2.090

21 2.080

22 2.074

23 2.069

24 2.064

25 2.060

26 2.056

27 2.052

28 2.048

29 2.043

30 2.042

40 2.021

60 2.000

120 1.980

∞1.960quotesdbs_dbs8.pdfusesText_14