[PDF] [PDF] Contents - University of Regina

5 3 3 Median for Grouped Data - Continuous Variable 178 For a set of data which is ungrouped, the mode is determined by counting the number of times 



Previous PDF Next PDF





[PDF] Lecture 2 – Grouped Data Calculation

Mean, Median and Mode 2 Step 3: Find the median by using the following formula: M edian Example: Based on the grouped data below, find the median:



[PDF] Unit 1 Measures of Central Tendency

Calculations with the Mean 1 4 Mean, Median and Mode for Grouped Data CMM Subject Support Strand: STATISTICS Unit 1 Measures of Central Tendency:  



[PDF] Lecture 3: Measure of Central Tendency - University of New Brunswick

Median Mode Geometric Mean Mean for grouped data The Median for Grouped Data The Mode Characterize the average or typical behavior of the data



[PDF] Mean & Median and Mode for Grouped Data

Chapter 5 Measuring Central Tendency of Grouped Data tape rentals summarized with a frequency distribution to estimate average daily rentals for the year



[PDF] Chapter &# 03 Measures of Central Tendency

Arithmetic Mean 2 Geometric Mean 3 Harmonic Mean 4 Mode 5 Median Using formula of direct method of arithmetic mean for grouped data: 1 1 n i i



[PDF] mean, median, mode, geometric mean and harmonic mean for

Median Grouped data In a grouped distribution, values are associated with frequencies Grouping can be in the form of a discrete frequency distribution or a  



[PDF] Contents - University of Regina

5 3 3 Median for Grouped Data - Continuous Variable 178 For a set of data which is ungrouped, the mode is determined by counting the number of times 



[PDF] Mean, Median and Mode - Statstutor

The median value of a set of data is the middle value of the ordered data That is, the data must be put in numerical order first Worked examples Find the median  



[PDF] Mean, Median, Mode, Grouped Data Int - MSU Math

if n is odd, median is middle point n is even, median is average of the two middle points Mode: the value of x which occurs most often May be more than one; 



[PDF] 31 Measures of Central Tendency: Mode, Median, and - Cengage

Explain how mean, median, and mode can be affected by extreme data values • What is one group of data is more or less consistent than the other? Explain

[PDF] median mode questions and answers pdf in hindi

[PDF] meilleur langage de programmation pour créer un site web

[PDF] mercedes sprinter utilitaire occasion france

[PDF] météo (74100 annemasse france)

[PDF] meteo france apremont 01100

[PDF] meteo france arques 62510

[PDF] météo france beauvais 60000

[PDF] meteo france etaples 62630

[PDF] meteo france hesdin 62140

[PDF] météo france lyon 15 jours

[PDF] meteo france marine finistere sud

[PDF] meteo france montereau fault yonne 77130

[PDF] meteo france orange 74800

[PDF] météo france paris 15

[PDF] meteo france pomponne 77400

Contents

5 Central Tendency and Variation 161

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

5.2 The Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

5.2.1 Mode for Ungrouped Data . . . . . . . . . . . . . . . . 163

5.2.2 Mode for Grouped Data . . . . . . . . . . . . . . . . . 166

5.3 The Median . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

5.3.1 Median for Ungrouped Data . . . . . . . . . . . . . . . 172

5.3.2 Median for Grouped Data - Discrete Variable . . . . . 175

5.3.3 Median for Grouped Data - Continuous Variable . . . 178

5.4 The Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

5.4.1 Mean for Ungrouped Data . . . . . . . . . . . . . . . . 185

5.4.2 Mean for Grouped Data . . . . . . . . . . . . . . . . . 189

5.5 Uses of Measures of Centrality . . . . . . . . . . . . . . . . . 201

160

Chapter 5

Central Tendency and

Variation

5.1 Introduction

In order to examine, present and understand distributions, it is useful to organize the data presented in distributions into summary measures. The centre of a distribution and the variability in a distribution are the most common and often the most useful summary measures which can be provided concerning a distribution. Together these two sets of measures tell a great deal concerning the nature of the distribution, and how the members of the population are spread across the distribution. The summary measure with which most people are familiar is the aver- age, or in technical terms, the arithmetic mean. For example, an instructor may summarize the set of all grades in a class by calculating the average, or mean, grade. For a student, the grade point average is a measure which summarizes his or her performance. Each of these averages provides a quick idea of where the distribution of the set of grades is centred. This average, or arithmetic mean, is only one of several summary measures which can be used to describe the centre of a set of data. The other measures commonly used are the median and the mode. Measures of variation are less commonly used than are measures of cen- tral tendency. Even so, measures of variation are very important for un- derstanding how similar, or how varied, di®erent members of a population are. The idea of a measure of variation is to present a single number which indicates whether the values of the variable are all quite similar or whether 161

INTRODUCTION162

the values vary considerably. A measure of variation familiar to most people is the range, the numerical di®erence between the largest and the smallest value in a set of data. For example, if the range of prices for product A is from $15 to $25 at di®erent stores, then this product would ordinarily be considered to have a considerable degree of variation from store to store. In contrast, if product B has a range of prices from $19 to $21, then this means a considerably smaller variation in price. The range for A is $10, and for B is $2, so that the variation in price of A is much greater than the variation in price of product B. Other measures of variation are the interquartile range, the variance and the standard deviation. Each of these is important in statistical analysis, extensively used in research, but not commonly reported in the media. In order to carry out statistical analysis of data, it is necessary to become familiar with these measures of variation as well. In addition to measures of central tendency and variation, statisticians use positional measures, indicating the percentage of the cases which are less than a particular value of the variable. Distributions can also be described as symmetrical or asymmetrical, and statisticians have devised various mea- sures of skewness to indicate the extent of asymmetry of a distribution. Var- ious other summary measures will be mentioned in this text, but measures of centrality and variation are the most important statistical measures. This chapter presents the three common measures of central tendency, showing how to calculate them, and then discussing the di®erent uses for each of these measures. Following this, measures of variation are presented and discussed. Some comments concerning the interpretation and use of these measures are also made. Measures of Central Tendency.As the name indicates, these measures describe the centre of a distribution. These measures may be termed either measures of centralityormeasures of central tendency. In the case of some data sets, the centre of the distribution may be very clear, and not subject to any question. In this case, each of the measures discussed here will be the same, and it does not matter which is used. In other cases, the centre of a distribution of a set of data may be much less clear cut, meaning that there are di®erent views of which is the appropriate centre of a distribution. The following sections show how to calculate the three commonly used measures of central tendency:mode,medianandmean. Since the man-

THE MODE163

ner in which each is calculated di®ers depending on whether the data is grouped or ungrouped, the method of calculating each measure in each cir- cumstance is shown. Following this, some guidelines concerning when each of the measures is to be used are given.

5.2 The Mode

The mode for a set of data is the most common value of the variable, that value of the variable which occurs most frequently. The mode is likely to be most useful when there is a single value of the variable which stands out, occurring much more frequently than any other value. Where there are several values of a variable, with each occuring frequently and a similar number of times, the mode is less useful. This statistical use of the wordmodecorresponds to one of the uses of the same word in ordinary language. We sometimes refer to fashion as what is in mode, or in common use. This implies a particular fashion which is more common than others. It will be seen that the type of scale or measurement is important in determining which measure of central tendency is to be used. For example, the median requires at least an ordinal scale, and the mean requires an interval or ratio scale. In contrast, the mode requires only a nominal scale. Since all scales are at least nominal, the mode of a set of data can always be calculated. That is, regardless of whether a variable is nominal, ordinal, interval or ratio, it will have a mode.

5.2.1 Mode for Ungrouped Data

For a set of data which is ungrouped, the mode is determined by counting the number of times each value of the variable occurs. The mode is then the value of the variable which occurs more frequently than any other value of the variable. De¯nition 5.2.1 Mode of Ungrouped Data.For ungrouped data, the modeis the value of the variable which occurs most frequently. When there is more than one value of the variable which occurs most frequently, then there is more than one mode. That is, the mode is not necessarily unique, and a set of data could possibly have two or more modes.

THE MODE164

These situations could be referred to as bimodal or trimodal. Further, there may be no unique mode. That is, each value of the variable may occur an equal number of times. In this situation, each value could be considered to be a mode, but this is not particularly useful, so that this situation would ordinarily be reported as having no mode.

Example 5.2.1 Mode in a Small Sample

The example of Table 5.1 was brie°y discussed in Chapter 4. This is a small random sample of 7 respondents selected from the data set of Appendix??. For each of the variables, the mode is given in the last line of the table. A short discussion of these modes follows. Case

No. SEX AGE PARTY GMP THRS CLASS FIN

1 Female 32 NDP 1300 50 U. Middle 42,500

2 Female 33 NDP 1950 25 Working 27,500

3 Male 34 NDP 2500 40 Middle 37,500

4 Female 34 NDP 1850 40 Middle 52,500

5 Male 46 PC 5000 50 Middle 100,000

6 Female 34 NDP 700 16 Working 37,500

7 Female 53 LIB 3125 40 Middle 62,500

Mode Female 34 NDP -- 40 Middle 37,500

Table 5.1: A Sample of 7 Respondents

For this sample, the mode for each of the variables can be determined by counting the frequency of occurrence of each value. For the variableSEX, there are 5 females and 2 males, so that the mode ofSEXin this sample is female. For the variableAGE, the value 34 occurs three times, while each of the other ages occurs only once. This means that the modal age of the respondents in this sample is 34. The mode of political preference is NDP, with 5 respondents supporting the NDP and only one respondent supporting each of the other two parties. The mode of gross monthly pay,GMP, is not unique, with each value occurring only once. The mode of hours worked per week is 40, and of social class is middle class, with these values occurring most frequently.

THE MODE165

Family income,FIN, has a mode of $37,500 since this income occurs twice, and each of the other values of family income occur only once. While a family income of $37,500 is the mode, this may not be a particularly useful measure. This value just happens to occur twice, whereas each of the other values occurs once, only one less time.

Example 5.2.2 Mode from a Stem and Leaf Display

The ordered stem and leaf display of Table 5.1 was given in Chapter 4. Once such a stem and leaf display has been produced, the mode can easily be determined by counting which value of the variable occurs most frequently. In this example, the value 40 occurs 20 times, a considerably larger number of times than any other value. Thus the mode of total hours worked for this sample of 50 Saskatchewan households is 40 hours worked per week. 0 3 4 1 0 6 2 3 5 3

0 0 2 5 5 5 6 7 7 7 7 7 7 7 8 9

4

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 8

5

0 0 0 0 0 4

Figure 5.1: Ordered Stem and Leaf Display for Total Hours of Work It may also be useful to note that the value which occurs next most frequently is 37 hours per week, occurring 7 times. This could be considered a secondary, although much less distinct, mode. Finally, the value of 50 hours worked per week occurs 5 times, and this might be considered to be a tertiary mode. Reporting these secondary or tertiary modes is not required when reporting the mode, but noting and reporting such secondary and tertiary modes may be useful in understanding the distribution of the variable. In the case of total hours worked, 40 hours per week is by far the most common, with 37 and 50 hours per week also being common hours worked. Together these three values account for20 + 7 + 5 = 32of the 50 values reported. In the stem and leaf display of Example 4.5.3 and Figure 4.5, the mode of family income is 9 thousand dollars, with this value occurring 6 times. The next most common value is 11 thousand dollars, occurring 3 times. None of the other values of family income occur more than twice.

THE MODE166

5.2.2 Mode for Grouped Data

When data has been grouped, the mode is determined on the basis of the category or interval which contains the largest number of cases. If the vari- able is nominal, this is a straightforward process of examining the frequency distribution and determining which category has the largest number of cases. Where the data is ordinal or interval, more care must be taken in deter- mining the mode, because adjacent values of the variable may be grouped together. This may make some categories occur more frequently than oth- ers, as a result of the manner in which the data has been grouped. In the case of such grouped data, the mode occurs at the peak of the histogram, that is, where the density of occurrence of the variable is greatest. De¯nition 5.2.2 Mode of Grouped Data - Nominal Scale or Equal Size Intervals.For grouped data where the scale in nominal, themodeis the value of the variable which occurs most frequently. Where an ordinal, interval or ratio scale has been grouped into intervals of equal width, the mode is the interval having the greatest frequency of occurrence. In the case of a continuous variable, the mode can be considered to be either the interval with the greatest frequency of occurrence, or the midpoint of this interval.

Political Preference No. of Respondents

Liberal 6

NDP 24

Progressive Conservative 11

Undecided 8

Would not vote 1

Total 50

Table 5.2: Provincial Political Preference, 50 Regina Respondents

Example 5.2.3 Political Preference

THE MODE167

The distribution of political preference of 50 respondents in Table 5.2 was originally given in Chapter 4. The variable here ispolitical preferenceand this is measured on a nominal scale. The value of the variable which occurs most frequently is NDP, occurring 24 times, more than the number of times any other value occurs. Thus NDP is the modal value of political preference. Note that this is quite a distinct mode, in that this value occurs in 24 of the 50 cases. With respect to elections, the mode is an important measure, because the party receiving the most votes wins the election. While political preference based on polls is not a sure indicator of how people will vote on election day, the mode of political preference is an important guide in determining the likely winner.

Income in Per Cent of Individuals of

dollars Native Origin All Origins

0-4,999 30.7 19.0

5,000-9,999 23.5 20.1

10,000-14,999 13.4 13.8

15,000-19,999 9.4 11.3

20,000-24,999 7.4 9.4

25,000 and over 15.6 26.3

Total 100.0 100.0

No. of individuals 253,980 17,121,000

Table 5.3: Distribution of Individuals with Income, Native and All Origins,

Canada, 1985

Example 5.2.4 Income Distributions of People of Native and of

All Origins

The distribution of income of people of native origin and of all origins in Canada in 1985, in Table 5.3 is taken from Table 3.4 ofThe Canadian Fact Book on Poverty 1989by David P. Ross and Richard Shillington. The variable `income of individuals' is measured on an interval or ratio scale. With the exception of the last, open-ended interval, the data has

THE MODE168

been grouped into intervals of equal size, each interval representing $5,000 of income. The distributions in the table are both percentage distributions, so that the interval with the largest percentage will be the modal interval, that is, the interval with the largest number of cases. For the income distribution of individuals of native origin, the interval which has the largest percentage of cases is $0-4,999 and this is the mode for this distribution. For individuals of all origins in Canada, the mode appears to be the $25,000 and over interval. However, this interval is much wider than the other intervals, and contains such a large percentage of the cases mainly because it is a wide open ended interval. Based on this, the mode for this distribution would best be considered to be the interval $5-9,999, because this interval has the largest percentage of cases of those intervals of equal width. In each of these two distributions, the mode could alternatively be re- ported as the respective midpoint of the interval. This would give a mode of approximately $2,500 for individuals of native origin, and $7,500 for in- dividuals of all origins. De¯nition 5.2.3 Mode of Grouped Data - Ordinal, Interval or Ra- tio Scale with Unequal Size Intervals.For grouped data where the scale is ordinal, interval or ratio scale, and where the data has been grouped into intervals of di®erent widths, themodeis the interval with thegreat- est density of occurrence. Alternatively, the mode is the value of the variable where thehistogram reaches its peak.

Example 5.2.5 Canada Youth and Aids Study

The data in Table 5.4 is taken from Figure 6.6 of Alan J. C. King et. al., Canada Youth and AIDS Study. This data come from a 1988 study of

38,000 Canadian Youth. The percentage distributions in Table 5.4 give the

distributions of the number of sexual partners of university or college males and females who had had sexual intercourse at the time of the survey. The sample sizes on which these distributions are based is not given in the ¯gure in the publication. The frequency distributions given here are presented with the categories used in the original publication. A quick glance at the frequency distributions might give the impression that the mode of the number of sexual partners is 3-5 for males and 1 for

THE MODE169

Per Cent of

Number of Individuals Who Are:

Partners Male Female

1 23 36

2 12 17

3-5 29 26

6-10 17 14

11 + 19 7

Total 100 100

Table 5.4: Distribution of Number of Sexual Partners of College and Uni- versity Respondents, by Gender females. However, the intervals 3-5 and 6-10 represent more than one value of the variable, while the categories 1 and 2 each represent one value of the variable. For females, the most frequently occurring category is 1, and since that is already an interval representing a single value of the variable, it can be considered to be the mode. For males, the densities of occurrence for the di®erent intervals must be calculated in order to determine which value of the variable is the mode. The density for each of the ¯rst two categories is the percentage of cases in each category, since each of these already represents exactly 1 unit of the variable, `number of sexual partners'. The interval 3-5 represents 3 values of this variable, 3, 4 and 5. The density of occurrence here is thus29=3 = 9:7. The density of the intervals 6-10 and 11 and over need not be calculated since the percentage of cases on each of these intervals can be seen to be lower than that of the 3-5 interval, and these last two intervals are even wider than the 3-5 interval. As a result of these considerations, the mode of the number of sexual partners for males is also seen to be 1 partner. This category has 23 per cent of the cases, more than the 12% having two partners, and also more than the average of 9.7 per cent of males having each of 3, 4 or 5 partners.

Example 5.2.6 Examples from Chapter 4

THE MODE170

The correct histogram of Figure??gives a peak bar at socioeconomic status of 50-60. The mode of socioeconomic status is 50-60 or 55. Note though that the 40-50 category occurs almost as frequently. The two distributions of wives' education Figure??and Figure??have peaks at 11.5-12.5. The mode of years of education of wives of each income level is 12 years of education. The distribution of farm size in Figure??shows an initial peak at the interval `under 10' acres. However, this is most likely to be an anomaly based on the rather odd grouping that was given in Table??. Based on this table, one would have to conclude that the mode is under 10 acres. However, the rest of the distribution shows a distinct peak at 240-399 acres, and this would be a more meaningful mode to report. If the interval `under

10' were to be regrouped with the interval 10-69, this would produce an

interval 0-69, of width 70 acres, and with density(593 + 1107)=70 = 24:3 farms per acre. Based on this, the interval with greatest density is then

240-399, and the modal farm size could be reported to be 240-399, or 320

acres. This would be more meaningful than reporting that the modal farm size in Saskatchewan is under 10 acres. The distribution of the number of people per household in Table?? might best be reported as yielding two modes. The number of households with 2 people is most common, with 286 households. The number of house- holds with only 1 person, or with three persons is distinctly less than this. However, a secondary peak is reached at 4 persons per household, with 223 households reporting this many people. The distribution has these two dis- tinct peaks, with 2 being the mode, or the primary mode, and with 4 persons per household being a secondary mode. The two distinct peaks can clearly be seen in Figure??. Summary of the Mode.Since the mode requires only a nominal scale of measurement, and since all variables have at least a nominal scale, the mode can always be determined. All that is required is to count the values of the variable which occur in the sample, and determine which value occurs most frequently. This value is the mode. In the case of grouped data, the mode is either the value of the variable at the peak of the histogram, or the category or interval which occurs with greatest density. In the latter case, the midpoint of the interval with greatest density may be chosen as the median. The mode is most useful as a measure in two circumstances. First, if

THE MEDIAN171

the category which occurs most frequently is all that needs to be known, then the mode gives this. In elections, all that matters is which party or candidate gets the most votes, and in this case the modal candidate is the winner. For purposes of supplying electrical power, power utilities need to be sure that they have su±cient capacity to meed peak power needs. The peak use of power could be considered to be the mode of power use. In circumstances of this sort, the mode is likely to be a very useful measure. Second, the mode is useful when there is a very distinct peak of a dis- tribution. Where many of the values occur almost an equal number of times, the mode may not be clearly de¯ned, or may not be of much interest. Where a particular value occurs many more times than other values, then it is worthwhile to report this as the mode. At the same time, the mode has several weaknesses. First, in no way does it take account of values of the variable other than the mode, and how frequently they occur. For example, in Figure??, for wives with annual income of less than $5,000 annually, 11 years of education is almost as com- mon as 12 years of education, but the mode is 12. For Figure??, the mode is much more clearly 12 years, with 11 years being much less commmon. Second, in the case where the scale is ordinal, interval or ratio, the mode does not take advantage of the numerical values which the variable takes on. When determining the mode, the variable could just as well be nominal in all cases. Where a variable has numerical values on an ordinal scale, these values can be used to rank the values. If the numerical values have been measured on an interval scale, these values can be added and subtracted. This extra information provided by these numerical values can be used to provide other measures of central tendency. These are discussed in the following sections.

5.3 The Median

The median is a second measure of central tendency, and one which takes advantage of the possibility of ranking or ordering values of the variable from the smallest to the largest, or largest to the smallest value. If this ordering is carried out, the median is the middle value of the distribution, the value such that one half of the cases are on each side of this middle value. The median is thus the centre of the distribution in the sense that it splits the cases in half, with one half less than this centre and one half greater than this centre.

THE MEDIAN172

De¯nition 5.3.1Themedianof a set of values of a variable is the value of the variable such that one half of the values are less than or equal to this value, and the other half of the values of the variable are greater than or equal to this value. The median requires an ordinal scale of measurement. Since interval and ratio scales are also ordinal, the median can be determined for any variable which has an ordinal, interval or ratio scale of measurement. For variables which have a scale of measurement no more than nominal, the median cannot be meaningfully determined. Since the median gives the central value of a set of values, the median is more properly a measure of central tendency than the mode. The mode could occur at one of the extremes of the distribution, if the most common value of the variable occurs near one end of the distribution. By de¯nition, the median always will be at the centre of the values of the variable. As with the mode, there are di®erent methods of calculating the median depending on whether the data is ungrouped or grouped. These methods are discussed next, with examples of each method being provided.

5.3.1 Median for Ungrouped Data

In order to determine the median value for a distribution, it is necessary to begin with a variable which has an ordinal or higher level of measurement. The method used to determine the median is to begin by taking the values of the variable in the data set, andranking these values in order. These values may be ranked either from the smallest to the largest, or largest to the smallest. It does not matter in which direction the values are ranked. The total number of values in the data set is then counted, and one half of this total is the central value. In order to determine the median, count the ordered values of the variable until you reach one half of the total values. This middle value is the median. If there are an odd number of values, there is a single central or median value of the variable. If there are an even number of total values, there will be two middle values. The median is then reported as either these two values, or the simple average of these two middle values. These precedures should become clearer in the following examples. Example 5.3.1 Median in a Small Sample - Odd Number of Values

THE MEDIAN173

The small data set of 7 cases presented in Chapter 4 and in Table 5.1 shows how the median can be calculated. Only the variablesAGE GMP,THRSandFIN are clearly ordinal or higher level scales and thus have meaningful medians. If the variableCLASSis considered to be ordinal, then the median can also be determined for this variable. The ages of the 7 respondents in this data set are 32, 33, 34, 34, 46, 34 and 53. Ranked in order from lowest to highest, these values are 32, 33,

34, 34, 34, 46 and 53. Note that where there is a value which occurs more

than once, each occurrence of that value is listed. Since there are a total of

7 values here, with one half of this being7=2 = 3:5, the median is the value

such that 3.5 cases are less and 3.5 are greater. This value is the middle 34 of the list and the median for this data set is 34 years of age. ForGMP, the values in order are 700, 1300, 1850, 1950, 2500, 3125 and

5000. The middle value here is 1950. ForTHRS, total hours of work per

week, the median is 40, the mdidle value of the set 16, 25, 40, 40, 40, 50,

50. Finally, for family income,FIN, the median is $42,500.

If the variableCLASS, social class that the respondent considers himself or herself to be in, is considered to be ordinal, then the median can be determined for this variable. This is the case even though the variable does not have numbers but names. If these names are given in order, they are Upper middle, middle, middle, middle, middle, working and working, where these values have been ranked from the highest to the lowest. For these 7 values, the median ismiddle class. Example 5.3.2 Median in a Small Sample - Even Number of Valuesquotesdbs_dbs17.pdfusesText_23