Medical Statistics at a Glance is directed at undergraduate medical students, medical researchers, postgraduates in the biomedical
Biostatistics i PREFACE This lecture note is primarily for Health officer and Medical students who need to understand the principles of data collection,
Objectives of this lecture • Statistics Statistical Investigation • Popular terminologies in Statistics • Data Types • Methods of data collection
This book, through its several editions, has continued to adapt to evolving areas of research in epidemiology and statistics, while maintaining the orig-
Medical Statistics at a Glance is directed at undergraduate medical students, medical researchers, Fig 7 1 The probability density function, pdf , of x
Introduction to Biostatistics / Robert R Sokal and F James Rohlf Dovcr cd We then cast a neccssarily brief glance at its historical
1 juil 2022 · tion, the general patterns in a set of data, at a single glance sample fxig of size n from the probability density function ( pdf ) f ?x;
Martin Bland: An Introduction to Medical Statistics 3rd ed Aviva Petrie and Caroline Sabin: Medical Statistics at a Glance Blackwell Science, 2000
Learn from supportive, accessible faculty in biostatistics, AT A GLANCE • 18 months • 42 credit hours • Summer matriculation Curriculum*
33440_6biostatistics.pdf
INTRODUCTIONTO
BIOSTATISTICS
SECONDEDITION
RobertR.Sakal
andF.JamesRohlf
StateUniversityNewYorkatStonyBrook
DOVERPUBLICATIONS,INC.
Mineola,NewYork
Cop)'right
Copyright((')1969,1973,19RI.19R7byRobertR.SokalandF.JamesRohlf
Allrightsreserved.
Bih/iographim/Note
ThisDoveredition,firstpublishedin2009,isanunabridgedrepublicationof theworkoriginallypublishedin1969byW.H.FreemanandCompany,New York.TheauthorshavepreparedanewPrefaceforthisedition.
Lihrary01'CongressCata/oging-in-Puhlimtio/lData
SokaLRobertR.
IntroductiontoBiostatistics/RobertR.SokalandF.JamesRohlf.
Dovcrcd.
p.cm. Originallypublished:2ndcd.NewYork:W.H.Freeman,1969.
Includes
bibliographicalreferencesandindex.
ISBN-I3:lJ7X-O-4R6-46961-4
ISBN-IO:0-4X6-46961-1
I.Biometry.r.Rohlf,F.James,1936-II.Title
QH323.5.S63.\2009
570.1'5195dcn
200R04R052
ManufacturedintheUnitedStalesofAmerica
DoverPuhlications,Inc.,31East2ndStreet,Mineola,N.Y.11501 toJulieandJanice
Contents
PREFACETOTHEDOVEREDITIONxi
PREFACExiii
1.INTRODUCTION
1.1Somedefinitions
1.2
Thedevelopmentofbiostatistics2
1.3
Thestatisticalframeoj"mind4
2.DATAINBIOSTATISTICS6
2.1Samplesandpopulations7
2.2Variablesinbiostatistics8
2.3Accuracyandprecisionoj"data10
2.4Derivedvariables13
2.5Frequencydistributions14
2.6Thehandlinyofdata24
3.DESCRIPTIVESTATISTICS27
3./Thearithmeticmean28
3.2Othermeans31
3.3Themedian32
3.4Themode33
3.5Theranye34
3.6
Thestandarddeviation36
3.7Samplestatisticsandparameters37
3.SPracticalmethodsjilrcomputinymeanandstandard
deviation 39
3.9Thecoefficientoj"variation43
V1I1 4.
CONTENTS
INTRODUCTIONTOPROBABILITYDISTRIBUTIONS:
THEBINOMIALANDPOISSONDISTRIBUTIONS46
4.1Probability,randomsampling,andhypothesistesting48
4.2Thebinomialdistribution54
4.3ThePoissondistribution63
CONTENTS
9.TWO-WAYANALYSISOFVARIANCE185
9.1Two-wayanovawithreplication186
9.2
Two-wayanova:Significancetesting197
9.3
Two-wayanOl'awithoutreplication199
IX
10.ASSUMPTIONSOFANALYSISOFVARIANCE211
10.1Theassumptionsofanova212
10.2Transformations216
10.3
Nonparametricmethodsinlieuofanova220
5. 6.
THENORMALPROBABILITYDISTRIBUTION74
5.1Frequencydistributionsofcontinuousvariables75
5.2Derivationofthenormaldistribution76
5.3Propertiesofthenormaldistriblltion78
5.4ApplicatiollSofthenormaldistribution82
5.5Departures/romnormality:Graphicmerhods85
ESTIMATIONANDHYPOTHESISTESTING93
6.1Distributionandvarianceofmeans94
6.2Distributionandvarianceoj'otherstatistics101
6.3Introductiontoconfidencelimits103
6.4Student'stdistriblllion106
6.5Confidencelimitsbased0/1sllmplestatistic.5109
6.6
Thechi-squaredistriburion112
6.7Confidencelimitsfurvariances114
6.8Introducrion/(Ihyporhesisresting115
6.9Testsofsimplehypothesesemployinytherdistriburion
6.10Testinythehypothesis11
0: fT2=fT6129 126
11. 12.
REGRESSION230
11.1Introductiontoregression231
11.2
Modelsinregression233
1/.3Thelinearregressioneqllation235
J1.4MorethanonevallieofYforeachvalueofX
11.5
Testsofsiyn!ficanceinreqression250
11.6Theusesofregression257
1/.7Residualsandtransformationsinreyression259
11.8Anonparametrictestforrewession263
CORRELATION267
/2./Correlationandreyression268
12.2Theproduct-momentcorrelationcoefficient270
/2.3Significancetestsincorrelation280 /2.4Applications0/correlation284 /2.5Kendall'scoefficientofrankcorrelation286 243
7.INTRODUCTIONTOANALYSISOFVARIANCE133
7.1Thevariance.\ofsamplesandrheirmeallS134
7.2
TheFdistrihution138
7.3
ThehypothesisII,,:fT;=143
7.4 lIeteroyeneiryIInWn!lsamplemeans143
7.5Parritio/li/l!ltherotalsumofsquaresUlU/deweeso/freedom
7.6ModelIanOfJa154
7.7Modell/anol'a157
150
13.ANALYSISOFFREQUENCIES294
/3./Te.\tsfilryom/nessorfll:Introductio/l
13.2Sinyle-c1assification!loodness
offlltesls /33Testsorindependence:T\\'o-waytables
APPENDIXES314
A/Malhemaricalappendix314
A2Statisricaltables320
295
301
305
8.SINGLE-CLASSIFICATIONANALYSISOFVARIANCE160
BIBLIOGRAPHY349
173
179
8./ 8.2 8.3 8.4 8.5 S.t!
Computatimlllifimrlllias161
Lqual/I162
UIll'I{IWI/l
165
Twowoups168
Comparis""slll/wnl!mea/ls:Plannedcomparisons
Compariso/l.\al/lOnl!means:UIlplannedcompuriso/lS
INIlEX353
PrefacetotheDoverEdition
Wearepleasedandhonoredtoseethere-issueofthesecondeditionofourIntroduc tiontoBiostatisticsbyDoverPublications.Onreviewingthecopy,wefindthere islittle initthatneedschangingforanintroductorytextbookofbiostatisticsforan advancedundergraduateorbeginninggraduatestudent.Thebookfurnishesanintro ductiontomost ofthestatisticaltopicssuchstudentsarelikelytoencounterintheir coursesandreadings inthebiologicalandbiomedicalsciences.
Thereadermaywonderwhatwewouldchange
ifweweretowritethisbookanew.
Because
ofthevastchangesthathavetakenplaceinmodalitiesofcomputationinthe lasttwentyyears,wewoulddeemphasizecomputationalformulasthatweredesigned forpre-computerdeskcalculators(anagebeforespreadsheetsandcomprehensive statisticalcomputerprograms)andrefocusthereader'sattentiontostructuralfor mulasthatnotonlyexplainthenature ofagivenstatistic,butarealsolessproneto roundingerror incalculationsperformedbycomputers.Inthisspirit,wewouldomit theequation(3.8)onpage39anddrawthereaders'attentiontoequation(3.7)instead.
Similarly,wewouldusestructuralformulas
inBoxes3.1and3.2onpages4\and42, respectively;onpage
161andinBox8.1onpages163/164,aswellasinBox12.1
onpages278/279. Secondly,wewouldputmoreemphasisonpermutationtestsandresamplingmethods. Permutationtestsandbootstrapestimatesarenowquitepractical.
Wehavefoundthis
approachtobenotonlyeasierforstudentstounderstandbut inmanycasespreferable tothetraditionalparametricmethodsthatareemphasized inthisbook.
Robert
R.Sokal
F.JamesRohlf
November2008
Preface
Thefavorablereceptionthatthefirsteditionofthisbookreceivedfromteachers andstudentsencouragedustoprepareasecondedition.Inthisrevisededition, weprovideathoroughfoundationinbiologicalstatisticsfortheundergraduate studentwhohasaminimalknowledgeofmathematics.WeintendIntroduction toBiostatistics tobeusedincomprehensivebiostatisticscourses,butitcanalso be adaptedforshortcoursesinmedicalandprofessionalschools;thus,we includeexamplesfromthehealth-relatedsciences.
Wehave
extractedmostofthistextfromthemore-inclusivesecondedition of ourownBiometry.Webelievethattheprovenpedagogicfeaturesofthat book,suchasitsinformalstyle,willbevaluablehere. WehavemodifiedsomeofthefeaturesfromBiometry;forexample,in IntroductiontoBiostatisticsweprovidedetailedoutlinesforstatistical compu tationsbutweplacelessemphasisonthecomputationsthemselves.Why? Studentsinmanyundergraduatecoursesarenotmotivatedtoandhavefew opportunitiestoperformlengthycomputationswithbiologicalresearchma terial;also,such computationscaneasilybemadeonelectroniccalculators andmicrocomputers.Thus,werelyonthecourseinstructortoadvisestudents onthebestcomputationalprocedurestofollow. Wepresentmaterialinasequencethatprogressesfromdescriptivestatistics tofundamental distributionsandthetestingofelementarystatisticalhypotheses; wethenproceedimmediatelytotheanalysisofvarianceandthefamiliarttest xivPREFACE (whichistreatedasaspecialcaseoftheanalysisofvarianceandrelegatedto severalsectionsofthebook).Wedothisdeliberatelyfortworeasons:(I)since today'sbiologistsallneedathoroughfoundationintheanalysisofvariance, studentsshouldbecomeacquaintedwiththesubjectearlyinthecourse;and(2) ifanalysisofvarianceisunderstoodearly,theneedtousethetdistributionis reduced.(Onewouldstill wanttouseitforthesettingofconfidencelimitsand inafewotherspecialsituations.)Allttestscanbecarriedoutdirectlyasanal yses ofvariance.andtheamountofcomputationoftheseanalysesofvariance isgenerallyequivalenttothatofttests. ThislargersecondeditionincludestheKolgorov-Smirnovtwo-sampletest, nonparametricregression,stem-and-Ieafdiagrams,hanginghistograms,andthe
Bonferroni
methodofmultiplecomparisons.Wehaverewrittenthechapteron theanalysisoffrequenciesintermsoftheGstatisticratherthanX 2, becausethe former hasbeenshowntohavemoredesirablestatisticalproperties.Also,be cause oftheavailabilityoflogarithmfunctionsoncalculators,thecomputation oftheGstatisticisnoweasierthanthatoftheearlierchi-squaretest.Thus,we reorientthechaptertoemphasizelog-likelihood-ratiotests.Wehavealsoadded newhomeworkexercises.
WecallspeciaL
double-numberedtables"boxes."Theycanbeusedascon venientguidesfor computationbecausetheyshowthecomputationalmethods forsolvingvarioustypesofbiostatistica!problems.Theyusuallycontainall thestepsnecessary tosolveaproblem--fromtheinitialsetuptothefinalresult. Thus,studentsfamiliarwithmaterialinthebookcanusethemasquicksum maryremindersofatechnique.
Wefoundinteachingthiscourse
thatwewantedstudentstobeableto refertothematerialnowintheseboxes.Wediscoveredthatwecouldnotcover evenhalfasmuch ofoursubjectifwehadtoputthismaterialontheblack boardduringthelecture,andsowemadeupanddistributedbox'?"dndasked studentstorefertothemduringthelecture.Instructorswhouscthisbookmay wishtousctheboxesinasimilarmanner. We emphasizethepracticalapplicationsofstatisticstobiologyinthisbook; thus.wedeliberatelykeepdiscussions ofstatisticaltheorytoaminimum.De rivations aregivenforsomeformulas,buttheseareconsignedtoAppendixAI, wheretheyshouldbestudiedandreworkedbythestudent.Statisticaltables towhichthereadercanreferwhenworkingthroughthemethodsdiscussedin this bookarefoundinAppendixA2. WearegratefultoK.R.Gabriel,R.C.Lewontin.andM.Kabayfortheir extensive commentsonthesecondeditionofBiometryandtoM.D.Morgan, E.Russek-Cohen,andM.Singhforcommentsonanearlydraftofthisbook.
Wealso
appreciatetheworkofoursecretaries,ResaChapeyandCherylDaly, with preparingthemanuscripts,andofDonnaDiGiovanni,PatriciaRohlf,and
BarbaraThomsonwithproofreading.
Robert
R.Sokal
F.
JamcsRohlf
INTRODUCTIONTO
BIOSTATISTICS
CHAPTER1
Introduction
Thischaptersetsthestageforyourstudyofbiostatistics.InSection1.1,we definethefielditself.Wethencastaneccssarilybriefglance atitshistorical devclopmentinSection
1.2.TheninSection1.3weconcludethechapterwith
adiscussion oftheattitudesthatthepersontrainedinstatisticsbringsto biologicalrcsearch.
1.1Somedefinitions
Wcshalldefinehiostatisticsastheapplicationofstatisti("(llmethodstotheso lutionofbiologi("(llprohlems.Thebiologicalproblemsofthisdefinitionarethose arisinginthebasicbiologicalsciencesaswellasinsuchapplied areasasthe health-relatedsciences andtheagriculturalsciences.Biostatisticsisalsocalled biologicalstatisticsorbiometry. Thedefinitionofbiostatisticsleavesussomewhatupintheair-"statistics" hasnotbeendefined.Statisticsisasciencewellknownbynameeventothe layman. Thenumberofdefinitionsyoucanfindforitislimitedonlybythe numberofbooksyouwishtoconsult.Wemightdefinestatisticsinitsmodern 2 CHAPTER1 /INTRODUCTION1.2/THEDEVELOPMENTOFBIOSTATISTICS3 senseasthescientificstudyofnumericaldatabasedonnaturalphenomena.All partsofthisdefinitionareimportantanddeserveemphasis: Scientificstudy:Statisticsmustmeetthecommonlyacceptedcriteriaof validityofscientificevidence.Wemustalwaysbeobjectiveinpresentationand evaluationofdataandadheretothegeneralethicalcodeofscientificmethod ology,orwemayfindthattheoldsayingthat"figuresneverlie,onlystatisticians do"appliestous. Data:Statisticsgenerallydealswithpopulationsorgroupsofindividuals' henceitdealswith quantitiesofinformation,notwithasingledatum.Thus, measurementofasingleanimalortheresponsefromasinglebiochemicaltest willgenerally notbeofinterest. Unlessdataofastudycanbequantifiedinonewayoranother, theyWIllnotbeamenabletostatisticalanalysis.Numericaldatacanbemea surements(thelength orwidthofastructureortheamountofachemicalin a bodyfluid,forexample)orcounts(suchasthenumberofbristlesorteeth). Naturalphenomena:Weusethisterminawidesensetomeannotonlyall thoseeventsin animateandinanimatenaturethattakeplaceoutsidethecontrol ofhumanbeings,butalsothoseevokedbyscientistsandpartlyundertheir control,as inexperiments.Differentbiologistswillconcernthemselveswith differentlevels ofnaturalphenomena;otherkindsofscientists,withyetdifferent ones.Butallwouldagree thatthechirpingofcrickets,thenumberofpeasin apod, andtheageofawomanatmenopausearenaturalphenomena.The heartbeat ofratsinresponsetoadrenalin,themutationrateinmaizeafter irradiation, ortheincidenceormorbidityinpatientstreatedwithvaccine maystillbeconsiderednatural,even thoughscientistshaveinterferedwiththe phenomenonthroughtheirintervention.Theaveragebiologistwouldnotcon siderthe numberofstereosetsboughtbypersonsindifferentstatesinagiven yearto beanaturalphenomenon.Sociologistsorhumanecologists,however, mightsoconsider itanddeemitworthyofstudy.Thequalification"natural phenomena"isincludedinthedefinitionofstatisticsmostlytomakecertain th.atthe phenomenastudiedarenotarbitraryonesthatareentirelyunderthe Willandoftheresearcher,suchasthenumberofanimalsemployedinanexpenment. Theword"statistics"isalsousedinanother,thoughrelated,way.Itcan betheplural ofthenounstatistic,whichreferstoanyoneofmanycomputed orestimatedstatisticalquantities,suchasthemean,thestandarddeviation,or thecorrelationcoetllcient.Eachoneoftheseisastatistic.
1.2Thedevelopmentofbiostatistics
Modernstatisticsappearstohavedevelopedfromtwosourcesasfarbackas theseventeenthcentury.Thefirstsourcewaspoliticalscience;aform ofstatistics developedasaquantitivedescription ofthevariousaspectsoftheaffairsof agovcrnmentorstate(hencetheterm"statistics").Thissubjectalsobecame knownaspoliticalarithmetic.Taxes andinsurancecausedpeopletobecomeinterestedinproblems ofcensuses,longevity,andmortality.Suchconsiderations assumedincreasingimportance,especiallyinEnglandasthe countryprospered duringthedevelopment ofitsempire.JohnGraunt(1620-1674)andWilliam
Petty(1623-1687)wereearlystudents
ofvitalstatistics,andothersfollowedin theirfootsteps. At aboutthesametime,thesecondsourceofmodernstatisticsdeveloped: themathematicaltheory ofprobabilityengenderedbytheinterestingames ofchance amongtheleisureclassesofthetime.Importantcontributionsto thistheorywere madebyBlaisePascal(1623-1662)andPierredeFermat (1601-1665),bothFrenchmen.JacquesBernoulli(1654-1705),aSwiss,laidthe foundationof modernprobabilitytheoryinArsConjectandi.Abrahamde
Moivre(1667-1754),a
FrenchmanlivinginEngland,wasthefirsttocombine
thestatistics ofhisdaywithprobabilitytheoryinworkingoutannuityvalues andtoapproximatetheimportantnormaldistributionthroughtheexpansion ofthebinomial.
Alaterstimulusforthedevelopment
ofstatisticscamefromthescienceof astronomy,inwhichmanyindividualobservationshadtobedigestedintoa coherenttheory.
Manyofthefamousastronomersandmathematiciansofthe
eighteenthcentury,suchasPierreSimonLaplace(1749-1827)in
Franceand
KarlFriedrichGauss(1777-1855)inGermany,wereamongtheleadersinthis field. Thelatter'slastingcontributiontostatisticsisthedevelopmentofthe method ofleastsquares.
Perhapstheearliest
importantfigureinbiostatisticthoughtwasAdolphe
Quetelet(1796-1874),aBelgian
astronomerandmathematician,whoinhis workcombinedthetheory andpracticalmethodsofstatisticsandappliedthem toproblems ofbiology,medicine,andsociology.FrancisGalton(1822-1911), acousin ofCharlesDarwin,hasbeencalledthefatherofbiostatisticsand eugenics.TheinadequacyofDarwin'sgenetictheoriesstimulatedGaltontotry tosolvetheproblemsofheredity.
Galton'smajorcontributiontobiologywas
hisapplication ofstatisticalmethodologytotheanalysisofbiologicalvariation, particularly throughtheanalysisofvariabilityandthroughhisstudyofregres sion andcorrelationinbiologicalmeasurements.Hishopeofunravelingthe laws ofgeneticsthroughtheseprocedureswasinvain.Hestartedwiththemost ditllcultmaterial andwiththewrongassumptions.However,hismethodology hasbecomethefoundationfortheapplication ofstatisticstobiology. KarlPearson(1857-1936),atUniversityCollege,London,becameinter estedintheapplication ofstatisticalmethodstobiology,particularlyinthe demonstrationofnaturalselection.Pearson'sinterestcameaboutthroughthe influenceof W.F.R.Weldon(1860-1906),azoologistatthesameinstitution.
Weldon,incidentally,
iscreditedwithcoiningtheterm"biometry"forthetype ofstudiesheandPearsonpursued.PearsoncontinuedinthetraditionofGalton andlaidthefoundationformuchofdescriptiveandcorrelationalstatistics. The dominantfigureinstatisticsandhiometryinthetwentiethcenturyhas beenRonald A.Fisher(18901962).Hismanycontributionstostatisticaltheory willbecomeobviouseventothecursoryreaderofthishook.
4CHAPTER1 /INTRODUCTION
1.3/THESTATISTICALFRAMEOFMIND
5 Statisticstodayisabroadandextremelyactivefieldwhoseapplications touchalmosteveryscience andeventhehumanities.Newapplicationsforsta tisticsareconstantlybeingfound, andnoonecanpredictfromwhatbranch ofstatisticsnewapplicationstobiologywillbemade.
1.3Thestatisticalframeofmind
Abrief perusalofalmostanybiologicaljournalrevealshowpervasivetheuse ofstatisticshasbecomeinthebiologicalsciences.Whyhastherebeensucha markedincreaseintheuseofstatisticsinbiology?Apparently,becausebiol ogistshavefound thattheinterplayofbiologicalcausalandresponsevariables does notfittheclassicmoldofnineteenth-centuryphysicalscience.Inthat century,biologistssuchasRobertMayer,HermannvonHelmholtz,andothers tried todemonstratethatbiologicalprocesseswerenothingbutphysicochemi calphenomena.Insodoing,theyhelpedcreatetheimpression thattheexperi mentalmethods andnaturalphilosophythathadledtosuchdramaticprogress inthephysicalsciencesshouldbeimitatedfullyinbiology. Manybiologists,eventothisday,haveretainedthetraditionofstrictly mechanistic anddeterministicconceptsofthinking(whilephysicists,interest inglyenough,astheirsciencehasbecomemorerefined,havebegun toresort tostatisticalapproaches).Inbiology,mostphenomenaareaffectedbymany causalfactors,uncontrollableintheirvariationandoftenunidentifiable.Sta tistics isneededtomeasuresuchvariablephenomena,todeterminetheerror ofmeasurement,andtoascertaintherealityofminutebutimportantdifferences. Amisunderstandingoftheseprinciplesandrelationshipshasgivenrise to theattitudeofsomebiologiststhatifdifferencesinducedbyanexperiment,or observedbynature,arenotclearonplaininspection(andthereforeareinneed ofstatisticalanalysis),theyarenotworthinvestigating.Thereare fewlegitimate fieldsofinquiry,however,inwhich,fromthe natureofthephenomenastudied, statisticalinvestigation isunnecessary.
Statisticalthinking
isnotreallydifferentfromordinarydisciplinedscientific thinking, inwhichwetrytoquantifyourobservations.Instatisticsweexpress ourdegreeofbeliefordisbeliefasaprobabilityratherthanasavague,general statement. Forexample,astatementthatindividualsofspeciesAarelarger thanthose ofspeciesBorthatwomensuffermoreoftenfromdiseaseXthan domenisofakindcommonlymadebybiologicalandmedicalscientists.Such statementscanandshouldbemorepreciselyexpressedinquantitativeform.
Inmanywaysthe
humanmindisaremarkablestatisticalmachine,absorb ingmanyfactsfromtheoutsideworld,digestingthese,andregurgitatingthem insimplesummaryform.Fromourexperienceweknowcertaineventstooccur frequently,othersrarely. "Mansmokingcigarette"isafrequentlyobserved event, "Manslippingonbananapeel,"rare.Weknowfromexperiencethat JapaneseareontheaverageshorterthanEnglishmenandthatEgyptiansare ontheaverage darkerthanSwedes.Weassociatethunderwithlightningalmost always, flieswithgarbagecansinthesummerfrequently,butsnowwiththesouthernCaliforniandesertextremelyrarely.
Allsuchknowledgecomestous
asaresultofexperience, bothourownandthatofothers,whichwelearn aboutbydirectcommunicationorthroughreading.Allthesefactshavebeen processedbythatremarkablecomputer,the humanbrain,whichfurnishesan abstract.This abstractisconstantlyunderrevision,andthoughoccasionally faulty andbiased,itisonthewholeastonishinglysound;itisourknowledge ofthemoment.
Althoughstatisticsarose
tosatisfytheneedsofscientificresearch,thedevel- opmentofitsmethodologyin turnaffectedthesciencesinwhichstatisticsis applied.Thus,throughpositivefeedback,statistics,createdtoservetheneeds ofnaturalscience,hasitselfaffectedthecontent andmethodsofthebiological sciences. Tociteanexample:Analysisofvariancehashadatremendouseffect ininfluencingthetypes ofexperimentsresearcherscarryout.Thewholefieldof quantitativegenetics,oneofwhoseproblems istheseparationofenvironmental fromgeneticeffects,depends upontheanalysisofvarianceforitsrealization, andmany oftheconceptsofquantitativegeneticshavebeendirectlybuilt aroundthedesignsinherentintheanalysisofvariance.
2.1/SAMPLESANDPOPULAnONS7
I!
CHAPTER2
DatainBiostatistics
InSection2,Iweexplainthestatisticalmeaningoftheterms"sample"and "population,"whichweshallbeusingthroughoutthisbook.Then,inSection
2.2,we
cometothetypesofobservationsthatweobtainfrombiologicalresearch material;weshallsee howthesecorrespondtothedifferentkindsofvariables uponwhichweperformthevariouscomputationsintherestofthisbook.In
Section2.3
wediscussthedegreeofaccuracynecessaryforrecordingdataand theprocedureforroundingolThgures.Weshallthenbereadytoconsiderin
Section2.4certain
kindsofderiveddatafrequentlyusedinbiologicalscience-- amongthemratiosandindices-andthepeculiarproblemsofaccuracyand distributiontheypresentus.Knowinghowtoarrangedatainfrequencydistri butionsisimportantbecausesucharrangementsgiveanoverallimpressionof thegeneralpatternofthevariationpresentinasampleandalsofacilitatefurther computationalprocedures.Frequencydistributions,aswellasthepresentation ofnumericaldata,arediscussedinSection2.5.InSection2.6webrieflydescribe the computationalhandlingofdata.
2.1Samplesandpopulations
Weshallnowdefineanumberofimportanttermsnecessaryforanunder standingofbiologicaldata.Thedatainbiostatisticsaregenerallybasedon individualobservations.Theyareobservationsormeasurementstakenonthe smallest samplingunit.
Thesesmallestsamplingunitsfrequently,butnotneces
sarily, arealsoindividualsintheordinarybiologicalsense.Ifwemeasureweight in100rats, thentheweightofeachratisanindividualobservation;thehundred ratweightstogetherrepresentthesampleofobservations,definedasacollection ofindividualobservationsselectedbyaspecifiedprocedure.Inthisinstance,one individual observation(anitem)isbasedononeindividualinabiological sense-thatis,onerat.However,ifwehadstudiedweightinasingleratover aperiod oftime,thesampleofindividualobservationswouldbetheweights recorded ononeratatsuccessivetimes.Ifwewishtomeasuretemperature inastudyofantcolonies,whereeachcolonyisabasicsamplingunit,each temperaturereadingforonecolonyisanindividualobservation,andthesample ofobservationsisthetemperaturesforallthecoloniesconsidered.Ifweconsider anestimateoftheDNAcontentofasinglemammalianspermcelltobean individualobservation,thesampleofobservationsmaybetheestimatesofDNA contentofallthespermcellsstudiedinoneindividualmammal. Wehavecarefullyavoidedsofarspecifyingwhatparticularvariablewas beingstudied,becausetheterms"individual observation"and"sampleofob servations"asusedabovedefineonlythestructurebutnotthenatureofthe datainastudy.Theactualpropertymeasuredbytheindividualobservations isthecharacter,orvariahle.Themorecommontermemployedingeneralsta tistics is"variable."However,inbiologytheword"eharacter"isfrequentlyused synonymously.
Morethanonevariablecanbemeasuredoneachsmallest
samplingunit.
Thus,inagroupof25micewemightmeasurethebloodpH
andtheerythrocytecount.Eachmouse(abiologicalindividual)isthesmallest samplingunit, bloodpHandredcellcountwouldbethetwovariablesstudied. thepHreadings andcellcountsareindividualobservations,andtwosamples of25observations(onpHandonerythrocytecount)wouldresult.Orwemight speak ofahil'ariatesampleof25observations.eachreferringtoapHreading pairedwith anerythrocytecount. Nextwedefinepopulation.Thebiologicaldefinitionofthislermiswell known. Itreferstoalltheindividualsofagivenspecies(perhapsofagiven life-historystage orsex)foundinacircumscribedareaatagiventime.In statistics, populationalwaysmeansthetotality0/indil'idualohsenJatiollsahout whichin/ere/In'sare
10hefrlLlde,existillyanywhereintheworldoratlcastu'ithill
adefinitelyspecifiedsamplingarealimitedinspacealldtime.
Ifyoutakefive
menandstudythenumberofIeucocytesintheirperipheralbloodandyou arcpreparedtodrawconclusionsaboutallmenfromthissampleoffive.then the populationfromwhichthesamplehasbeendrawnrepresentstheleucocyte countsofallextantmalesofthespeciesHomosapiens.If.ontheotherhand. yourestrictyllursclf toamorenarrowlyspecifiedsample.suchasfivemale
8CHAPTER2!DATAINBIOSTATISTICS2.2/VARIABLESINB10STATISTlCS9
Chinese,aged20,andyouarerestrictingyourconclusionstothisparticular group,thenthepopulationfromwhichyouaresamplingwillbeleucocyte numbersofallChinesemalesofage20. Acommonmisuseofstatisticalmethodsistofailtodefinethestatistical populationaboutwhichinferencescanbemade.Areportontheanalysisof asamplefromarestrictedpopulationshouldnotimplythattheresultshold ingeneral.Thepopulationinthisstatisticalsenseissometimesreferredtoas theuniverse. Apopulationmayrepresentvariablesofaconcretecollectionofobjectsor creatures,suchasthetaillengthsofallthewhitemiceintheworld,theleucocyte countsofalltheChinesemenintheworldofage20,ortheDNAcontentof allthehamsterspermcellsinexistence:oritmayrepresenttheoutcomesof experiments,suchasalltheheartbeatfrequenciesproducedinguineapigsby injectionsofadrenalin.Incasesofthefirstkindthepopulationisgenerally finite.Althoughinpracticeitwouldbeimpossibletocollect.count,andexamine allhamsterspermcells,allChinesemenofage20,orallwhitemiceintheworld, thesepopulationsareinfactfinite.Certainsmallerpopulations,suchasallthe whoopingcranesinNorthAmericaoralltherecordedcasesofararebuteasily diagnoseddiseaseX.maywellliewithinreachofatotalcensus.Bycontrast, anexperimentcanberepeatedaninfinitenumberoftimes(atleastintheory).
Agiven
experiment.suchastheadministrationofadrenalintoguineapigs. couldberepeatedaslongastheexperimentercouldobtainmaterialandhis orherhealthandpatienceheldout.Thesampleofexperimentsactuallyper formedisasamplefromanintlnitenumberthatcouldbeperformed. Someofthestatisticalmethodstobedevelopedlatermakeadistinction betweensamplingfromfiniteandfrominfinitepopulations.However,though populationsarctheoreticallyfiniteinmostapplicationsinbiology,theyare generallysomuchlargerthansamplesdrawnfromthemthattheycanbecon sidereddefactoinfinite-sizedpopulations.
2.2Variablesinbiostatistics
Eachbiologi<.:aldisciplinehasitsownsetofvariables.whichmayindudecon ventionalmorpholl.lgKalmeasurements;concentrationsof<.:hemicalsinbody Iluids;ratesofcertainbiologi<.:alproccsses;frcquenciesofcertainevents.asin gcndics,epidemiology,andradiationbiology;physicalreadingsofopticalor electronicmachineryusedinbiologicalresearch:andmanymore.
Wehave
alreadyreferredtobiologicalvariablesinageneralway.butwe havenotyetdefinedthem.WeshalldefineaI'ariahleasapropertywith respect towhich illa.\Im/pledifferillsOllieaSn'rtllillahlewar.Iftheproperty docsnotditTerwilhinasampleathandoratleastamonglhesamplesbeing studied,it<.:annotbeofstatisticalinlerL·st.Length,height,weight,numberof teeth.vitaminCcontent,andgenolypcsan:examplesofvariablesinordinary, geneticallyandphenotypicallydiversegroupsoflHganisms.Warm-bloodedness inagroupofm,lI11m,tlsisnot,sincemammalsareallalikeinthisregard, althoughbodytemperatureofindividualmammalswould,ofcourse,bea variable. We candividevariablesasfollows:
Variables
Measurementvariables
Continuousvariables
Discontinuousvariables
Rankedvariables
Attributes
Measurementvariablesarethosemt'(/surements(/ndthatareexpressed numerically. Measurementvariablesareoftwokinds.Thefirstkindconsistsof continuousvariables,whichatleasttheoreticallycanassumeaninfinitenumber ofvaluesbetweenanytwofixedpoints.Forexample,betweenthetwolength measurements1.5and1.6emthereareaninfinitenumberoflengthsthatcould bemeasuredifoneweresoinclinedandhadapreciseenoughmethodof calibration.Anygivenreadingofacontinuousvariable,suchasalengthof
1.57mm,isthereforeanapproximationtotheexactreading,whichinpractice
isunknowable.Manyofthevariablesstudiedinbiologyarecontinuousvari ables. Examplesarelengths,areas,volumes.weights,angles,temperatures. periodsoftime.percentages.concentrations,andrates. ContrastedwithcontinuousvariablesarethediscontilluousIJllriahlt's.also knownasmeristicordiscretevilrilih/t's.Thesearevariablesthathaveonlycer tainfixed numericalvalues.withnointermediatevaluespossibleinbetween. Thusthenumberofsegmentsinacertaininsectappendagemaybe4or5or
6butnever51or4.3.Examplesofdiscontinuousvariahksarcnumbersofa
given structure(suchassegments,bristles.teeth,orglands),numbersofollspring, numbersofcoloniesofmicroorganismsoranimals.ornumbersofplantsina given quadrat. Somevariablescannothemeasuredbutatleastcanbeorderedorranked bytheirmagnitude.Thus.inanexperimentonemightrecordtherankordn ofemergenceoftenpupaewithoutspecifyingtheexacttimeatwhicheachpupa emerged.Insuchcaseswecodethedataasarallkedmriahle,theorderof emergence.Spe<.:ialmethodsfordealingwithsu<.:hvariableshavebeendevel oped.andseveralarcfurnishedinthisbook.Byexpressingavariableasaseries ofranks,suchas1,2.3,4.5.wedonotimplythattheditTeren<.:einmagnitude between,say,ranksIand2isidenticaltoorevenproportionaltnthedif feren<.:ebetweenranks2and3. Variablesthat<.:annotbemeasuredbutmustbeexpressedqualitativelyarc calledaltrihutes,orlIominalI'liriahies.Theseareallproperties.sudlasbla<.:k orwhite.pregnantornotpregnant,deadoralive,maleorfemale.Whensuch attributesarecombinedwilhfrequen<.:ies,theycanbclrcatedstatistically.Of XOmi<.:e,wemay,forinstance.statethatfourwerehlad.twoagouti.andthe
10CHAPTER2 /DATAINBIOSTATISTICS
2.3/ACCURACYANDPRECISIONOFDATA11
restgray.Whenattributesarecombinedwithfrequenciesintotablessuitable forstatisticalanalysis,theyarereferredtoasenumerationdata.Thustheenu merationdataoncolorinmicewouldbearrangedasfollows: Insomecasesattributescanbechangedintomeasurementvariablesifthisis desired. Thuscolorscanbechangedintowavelengthsorcolor-chartvalues. Certainotherattributesthatcanberankedororderedcanbecodedtobe comerankedvariables.Forexample,threeattributesreferringtoastructure as"poorlydeveloped,""welldeveloped,"and"hypertrophied"couldbecoded
I,2,and3.
Atermthathasnotyetbeenexplainedisvariate.Inthisbookweshalluse it asasinglereading,score,orobservationofagivenvariable.Thus,ifwehave measurementsofthelengthofthetailsoffivemice,taillengthwillbeacon tinuousvariable,andeachofthefivereadingsoflengthwillbeavariate.In thistextweidentifyvariablesbycapitalletters,themostcommonsymbolbeing Y.ThusYmaystandfortaillengthofmice.Avariatewillrefertoagiven length measurement;1';isthemeasurementoftaillengthoftheithmouse,and Y 4 isthemeasurementoftaillengthofthefourthmouseinoursample. Color Black
Agouti
Gray
Totalnumberof
mice
Frequency
4 2 74
80
Mostcontinuousvariables,however,areapproximate.Wemeanbythis thattheexactvalueofthesinglemeasurement,thevariate,isunknownand probablyunknowable.Thelastdigitofthemeasurementstatedshouldimply precision;thatis,itshouldindicatethelimitsonthemeasurementscalebetween whichwebelievethetruemeasurementtolie.Thus,alengthmeasurementof
12.3mmimpliesthatthetruelengthofthestructureliessomewherebetween
12.25and12.35mm.Exactlywherebetweentheseimpliedlimitsthereallength
iswedonotknow.Butwherewouldatruemeasurementof12.25fall?Would itnotequallylikelyfallineitherofthetwoclasses12.2and12.3-clearlyan unsatisfactorystateofaffairs?Suchanargumentiscorrect,butwhenwerecord anumberaseither12.2or12.3,weimplythatthedecisionwhethertoputit intothehigherorlowerclasshasalreadybeentaken.Thisdecisionwasnot takenarbitrarily,butpresumablywasbasedonthebestavailablemeasurement. Ifthescaleofmeasurementissoprecisethatavalueof12.25wouldclearly havebeenrecognized,thenthemeasurementshouldhavebeenrecorded originallytofoursignificantfigures.Impliedlimits,therefore,alwayscarryone morefigure beyondthelastsignificantonemeasuredbytheobserver.
Hence,itfollows
thatifwerecordthemeasurementas12.32,weareimplying thatthetruevalueliesbetween12.315and12.325.Unlessthisiswhatwemean, therewouldbenopointinaddingthelastdecimalfiguretoouroriginalmea surements.Ifwedoaddanotherfigure,wemustimplyanincreaseinprecision. Wesee,therefore,thataccuracyandprecisioninnumbersarenotabsolutecon cepts,butarerelative.Assumingthereisnobias,anumberbecomesincreasingly moreaccurateasweareabletowritemoresignificantfiguresforit(increaseits precision). Toillustratethisconceptoftherelativityofaccuracy,considerthe followingthreenumbers:
Impli"d/imits
Wemayimaginethesenumberstoberecordedmeasurementsofthesamestruc ture.Letusassumethatwehadextramundaneknowledgethatthetruelength ofthegivenstructurewas192.758units.Ifthatwereso,thethreemeasurements wouldincreaseinaccuracyfromthetopdown,astheintervalbetweentheir impliedlimitsdecreased.Youwillnotethattheimpliedlimitsofthetopmost measurementarewiderthanthoseoftheonebelowit,whichinturnarewider thanthoseofthethirdmeasurement. Meristicvariates,thoughordinarilyexact,mayberecordedapproximately whenlargenumbersareinvolved.Thuswhencountsarereportedtothenearest thousand,acountof36,000insectsinacubicmeterofsoil,forexample,implies thatthetruenumbervariessomewherefrom35,500to36,500insects. Tohowmanysignificantfiguresshouldwerecordmeasurements?Ifwearray \.''In'lnL-,,,h"F\rI.-1i""rnof1"\"\'--1nn;111111""frc\tYlthpinthp...r(Jf"4..:t
2.3Accuracyandprecisionofdata
"Accuracy"and"precision"areusedsynonymouslyineverydayspeech,butin statisticswedefine themmorerigorously.Accuracyistheclosenessolameasured or computedvallietoitstruelJalue.Precisio/listheclosenessolrepeatedmeasure ments.Abiased butsensitivescalemightyieldinaccuratebutpreciseweight.By chance,aninsensitivescalemightresultinanaccuratereading,whichwould, however,beimprecise,sincearepeatedweighingwouldbeunlikelytoyieldan equallyaccurateweight.Unlessthereisbiasinameasuringinstrument,precision willleadtoaccuracy.Weneedthereforemainlybeconcernedwiththeformer.
Precise
variatesareusually,butnotnecessarily,wholenumbers.Thus,when wecountfoureggsinanest,thereisnodoubtabouttheexactnumberofeggs in thenestifwehavecountedeorrectly;itis4,not3or5,andclearlyitcould notbe4plusorminusafractionalpart.Meristic,ordiscontinuous,variablesare generallymeasuredasexactnumbers.Seemingly,continuousvariablesderived frommeristiconescanundercertainconditionsalsobeexactnumbers.For instance,ratiosbetweenexactnumbersarcthemselvesalsoexact.Ifinacolony ofanimalsthereareIXfemalesand12males,theratiooffemalestomales(a 193
192.8
192.76192.5193.5
192.75192.85
192.755192.765
12CHAPTER2 /DATAINBIOSTATISTICS2.4/DERIVEDVARIABLES13
26.51\227
133.71375133.71
O.OJ7253Il.0372
O.OJ71530.0372
In16211\.000
17.3476317.3
one,aneasyruletorememberisthatthenumberofunitstepsfromthesmallest tothelargestmeasurementinanarrayshouldusuallybebetween30and300. Thus,ifwearemeasuringaseriesofshellstothenearestmillimeterandthe largestis8mmandthesmallestis4mmwide,thereareonlyfourunitsteps betweenthelargestandthesmallestmeasurement.Hence,weshouldmeasure ourshellstoonemoresignificantdecimalplace.Thenthetwoextrememeasure mentsmightbe8.2mmand4.1mm,with41unitstepsbetweenthem(counting thelastsignificantdigitastheunit);thiswouldbeanadequatenumberofunit steps.Thereasonforsucharuleisthatanerrorof1inthelastsignificantdigit ofareadingof4mmwouldconstituteaninadmissibleerrorof25%,butanerror ofIinthelastdigitof4.1islessthan2.5%.Similarly,ifwemeasuredtheheight ofthetallestofaseriesofplantsas173.2cmandthatoftheshortestofthese plantsas26.6em,thedifferencebetweentheselimitswouldcomprise1466unit steps(of0.1cm),whicharefartoomany.Itwouldthereforebeadvisableto recordtheheightstothenearestcentimeter.asfollows:173cmforthetallest and27cmfortheshortest.Thiswouldyield146unitsteps.Usingtherulewe havestatedforthenumberofunitsteps,weshallrecordtwoorthreedigitsfor mostmeasurements. Thelastdigitshouldalwaysbesignificant;thatis,itshouldimplyarange forthetruemeasurementoffromhalfa"unitstep"belowtohalfa"unitstep" abovetherecordedscore,asillustratedearlier.Thisappliestoalldigits,zero included.Zerosshouldthereforenotbewrittenattheendofapproximatenum berstotherightofthedecimalpointunlesstheyaremeanttobesignificant digits.Thus7.80mustimplythelimits7.795to7.805.If7.75to7.85isimplied, themeasurementshouldberecordedas7.8. Whenthenumberofsignificantdigitsistobereduced,wecarryoutthe processofrOll/utin?}ofrnumbers.Therulesforroundingoffareverysimple.A digittoberoundedofTisnotchangedifitisfollowedbyadigitlessthan5.If thedigittoberoundedoffisfollowedbyadigitgreaterthan5orby5followed byothernonzerodigits,itisincreasedby1.WhenthedigittoberoundedofT isfollowedbya5standingaloneora5followedbyzeros,itisunchangedifit isevenbutincreasedbyIifitisodd.Thereasonforthislastruleisthatwhen suehnumbersaresummedinalongseries,weshouldhaveasmanydigits raisedasarebeinglowered,ontheaverage;thesechangesshouldtherefore balanceoul.PracticetheaboverulesbyroundingofTthefollowingnumbersto theindicatednumberofsignificantdigits:
Num"erSiyrli/icarltdi"itsdesired
Mostpocketcalculatorsorlargercomputersroundofftheirdisplaysusing adifferentrule:theyincreasetheprecedingdigitwhenthefollowingdigitisa 5 standingaloneorwithtrailingzeros.However,sincemostofthemachines usableforstatisticsalsoretaineightortensignificantfiguresinternally,the accumulationofroundingerrorsisminimized.Incidentally,iftwocalculators giveanswerswithslightdifferencesinthefinal(leastsignificant)digits,suspect adifferentnumberofsignificantdigitsinmemoryasacauseofthedisagreement.
2.4Derivedvariables
Themajorityofvariablesinbiometricworkareobservationsrecordedasdirect measurementsorcountsofbiologicalmaterialorasreadingsthataretheoutput ofvarioustypesofinstruments.However,thereisanimportantclassofvariables inbiologicalresearchthatwemaycallthederivedorcomputedvariables.These aregenerallybasedontwoormoreindependentlymeasuredvariableswhose relationsareexpressedinacertainway.Wearereferringtoratios,percentages, concentrations,indices,rates,andthelike. Aratioexpressesasasinglevaluetherelationthattwovariableshave,one totheother.Initssimplestform,aratioisexpressedasin64:24,whichmay representthenumberofwild-typeversusmutantindividuals,thenumberof malesversusfemales,acountofparasitizedindividualsversusthosenotpara sitized,andsoon.Theseexamplesimplyratiosbasedoncounts.Aratiobascd onacontinuousvariablemightbesimilarlyexpressedas1.2:1.8,whichmay representtheratioofwidthtolengthinascleriteofaninsectortheratio betweentheconcentrationsoftwomineralscontainedinwaterorsoil.Ratios mayalsobeexpressedasfractions;thus,thetworatiosabove couldbeexpressed asandU.However,forcomputationalpurposesitismoreusefultoexpress theratioasaquotient.Thetworatioscitedwouldthereforebe2.666...and
0.666...,respectively.Thesearepurenumbers,notexpressedinmeasurement
unitsofanykind.Itisthisformforratiosthatweshallconsiderfurther. arealsoatypeofratio.Ratios,percentages,andconcentrations arebasicquantitiesinmuchbiologicalresearch,widelyusedandgenerally familiar. Anindexistheratioofthevalueofonevariahietothevalueofaso-called standard OIlC.Awell-knownexampleofanindexinthissenseisthecephalic indexinphysicalanthropology.Conceivedinthewidesense,anindexcould betheaverageoftwomeasurements-eithersimply,suchast(lengthofA+ lengthofB),orinweightedfashion,suchas:\[(2xlengthofA)+lengthofBj. Ratesareimportantinmanyexperimentalfieldsofbiology.Theamount ofasubstanceliberatedperunitweightorvolumeofbiologicalmaterial,weight gainperunittime,reproductiveratesperunitpopulationsizeandtime(birth rates),anddeathrateswouldfallinthiscategory. Theuseofratiosandpercentagesisdeeplyingrainedinscientificthought. Oftenratiosmaybetheonlymeaningfulwaytointerpretandunderstandcer taintypesofbiologicalproblems.Ifthebiologicalprocessbcinginvestigated
14CHAPTER2 /DATAINBIOSTATISTICS2.5/FREQUENCYDISTRIBUTIONS
15 20
FIGURE2.1
Samplingfromapopulatl
B.Asampleof100.C.Asampleof500.D.Asampleof2000. 160
500
130140150
25
2000
A B o 100
c II.III I,II,I!I,II,
____---I_luI...L!lU'udUIILI.LU.1111lJ.1JJlllll.JwiLLlIwdLLI--l----'I.l.JII.LI.....L_ II o1I1.1li.1III uu IIIJ..UJ11Wilill.ll.l.l.
60708090100 110120
Birthweight(oz)
10 f1:t 20 f 10 0 70
60
50
40
f 30
operatesontheratioofthevariablesstudied,onemustexaminethisratioto understandtheprocess.Thus,SinnottandHammond(1935)foundthatinheri tance oftheshapesofsquashesofthespeciesCucurbitapepocouldbeinter pretedthroughaformindexbasedonalength-widthratio,butnotthrough theindependentdimensionsofshape.Bysimilarmethodsofinvestigation,we shouldbeabletofindselectionaffectingbodyproportionstoexistintheevolu tion ofalmostanyorganism. Thereareseveraldisadvantagestousingratios.First,theyarerelatively inaccurate.Letus returntotheratio:mentionedaboveandrecallfromthe previoussectionthatameasurementof1.2impliesatruerangeofmeasurement ofthevariablefrom1.15to1.25;similarly,ameasurementof1.8impliesarange from1.75to1.85.Werealize,therefore,thatthetrueratiomayvaryanywhere fromto,orfrom0.622to0.714.Wenoteapossiblemaximalerrorof 4.2%if1.2isanoriginalmeasurement:(1.25-1.2)/1.2;thecorrespondingmaxi
malerrorfortheratiois7.0%:(0.714-0.667)/0.667.Furthermore,thebest estimateofaratioisnotusuallythemidpointbetweenitspossibleranges.Thus, inourexamplethemidpointbetweentheimpliedlimitsis0.668andtheratio basedonUis0.666...;whilethisisonlyaslightdifference,thediscrepancy maybegreaterinotherinstances. Asecond
disadvantagetoratiosandpercentagesisthattheymaynotbe approximatelynormallydistributed(seeChapter5)asrequiredbymanystatis ticaltests. Thisdifficultycanfrequentlybeovercomebytransformationofthe variable(asdiscussedinChapter10).Athirddisadvantageofratiosisthat inusingthemonelosesinformationabouttherelationshipsbetweenthetwo variablesexceptfortheinformationabouttheratioitself. 2.5Frequencydistributions
Ifwewere
tosampleapopulationofbirthweightsofinfants,wecouldrepresent each sampledmeasurementbyapointalonganaxisdenotingmagnitudeof birthweight.ThisisillustratedinFigure2.1A,forasampleof25birthweights. Ifwe samplerepeatedlyfromthepopulationandobtain100birthweights,we shall probablyhavetoplacesomeofthesepointsontopofotherpointsin ordertoreeordthemallcorrectly(Figure2.1H).Aswecontinuesamplingad ditionalhundredsandthousandsofbirthweights(Figure2.ICand0),the assemblage ofpointswillcontinuetoincreaseinsizebutwillassumeafairly definiteshape. Theoutlineofthemoundofpointsapproximatesthedistribution ofthevariable.Rememberthatacontinuousvariablesuchasbirthweightcan assumeaninfinityofvaluesbetweenanytwopointsontheabscissa.Therefine mentofourmeasurementswilldeterminehowfinethenumberofrecorded divisionsbctweenanytwopointsalongtheaxiswillbe. Thedistributionofavariableisofconsiderablebiologicalinterest.Ifwe findthatthedislributioll isasymmetricalanddrawnoutinonedirection,ittells us thatthereis,perhaps,selectiollthatcausesorganismstofallpreferentially inoneofthetailsofthedistribution,orpossiblythatthescaleofmeasuremenl Theaboveisanexampleofaquantitativefrequencydistribution,sinceYis clearlya measurementvariable.However,arraysandfrequencydistributions neednotbelimitedtosuchvariables.Wecanmakefrequencydistributionsof attributes,calledqualitativefrequencydistributions.Inthese,thevariousclasses arelistedinsomelogicalorarbitraryorder.Forexample,ingeneticswemight haveaqualitativefrequencydistributionasfollows: 16 200
1:;0 100
oL-_(_)-....-'--......-,-; \'um),Profplants'1\\adrat CHAPTER2 /DATAINBIOSTATISTICS
FIGURE2.2
Bardiagram.FrequencyofthesedgeCarex
ftaccain500quadrats.DatafromTable2.2; orginallyfromArchibald(1950). 2.5/FREQUENCYDISTRIBUTIONS
Variable
y 9 8 7 6 5 4 Frequellcy
f I I 4 3 I 1 17 TAIlU:2.1
Twoqualitativefrequencydistributions.Numhcrofcasesof skincancer(melanoma)distrihutedoverhodyregionsof 4599menand47Xt>women.
Thistellsusthattherearetwoclassesofindividuals,thoseidentifedbytheA phenotype,ofwhich86werefound,andthosecomprisingthehomozygotere cessive aa,ofwhich32wereseeninthesample. Anexampleofamoreextensivequalitativefrequencydistributionisgiven in Table2.1,whichshowsthedistributionofmelanoma(atypeofskincancer) overbodyregionsinmenandwomen.Thistabletellsusthatthetrunkand limbsarethemostfrequentsitesformelanomasandthatthebuccalcavity,the restofthegastrointestinaltract,andthegenitaltractarcrarelyatllictedbythis ()/Jsel'l'ed)i-e4u('IuT MenWomen
I chosenissuchastobringaboutadistortionofthedistribution.If,inasample ofimmatureinsects,wediscoverthatthemeasurementsarebimodallydistrib uted(withtwopeaks),thiswouldindicatethatthepopulationisdimorphic. Thismeansthatdifferentspeciesorracesmayhavebecomeintermingledin oursample.Orthedimorphismcouldhavearisenfromthepresenceofboth sexesorofdifferentinstars. Thereareseveralcharacteristicshapesoffrequencydistributions.Themost commonisthesymmetricalbellshape(approximatedbythebottomgraphin Figure2.1),whichistheshapeofthenormalfrequencydistributiondiscussed inChapter5.Therearealsoskeweddistributions(drawnoutmoreatonetail thantheother),I.-shapeddistributionsasinFigure2.2,U-shapeddistributions, andothers,allofwhichimpartsignificantinformationahouttherelationships theyrepresent.Weshallhavemoretosayabouttheimplicationsofvarious typesofdistrihutionsinlaterchaptersandsections. After researchershaveobtaineddatainagivenstudy,theymustarrange thedatainaformsuitableforcomputationandinterpretation.Wemayassume thatvariatesarerandomlyorderedinitiallyorareintheorderinwhichthe measurementshavebeentaken.Asimplearrangementwouldbeanarmrof thedatahyorderofmagnitude.Thus.forexample,thevariates7,6,5,7,X,9, 6,7,4,6,7couldbearrayedinorderofdecreasingmagnitudeasfollows:9,X,
7.7, 7, 7,6, 6, 6,5,4.Wheretherean:somevariatesofthesamevalue.suchas
the6\andTsinthislictitillllSexample.atime-savingdevicemightimmediately haveoccurredtoyounamely.tolistafrequencyforeachoftherecurring variates;thus:9,X,7(4x).()(3xI,5,4.Suchashorthandnotatiollisonewayto representaFCII'h'IICI'disll'ihlllioll,whichissimplyanarrangementofthe ofvariateswiththefrequencyofI:achclassindicated.ConventIOnally,atre qUl:ncy distrihutiollISstall:dIIItabularform;forourexampk,thisisdOlleas follows: Phenotype
.I A-86 aa32 Ana/om;csilt'
Ilcadandncck
TrunkandIimhs
Buccal
cavity Rcstofgastr'lIntcslinaltracl
GcnitalIrael
Fyc Totall:ascs
Sourct'.DatafrolllICL'(I
')4') .124.1 X 5 12 3X2 45')')
645
.1645 II 21
')3 371
47X6
18 CHAPTER2 /DATAINBIOSTATISTICS2.5/FREQUENCYDISTRIBUTIONS19 SouI'ce.DatafromArchibald(t950).
TABU:2.2
Ameristicfrequencydistribution.
Numberofplantsofthesedgeearn
.f/accafoundin500quadrats. typeofcancer.Weoftenencounterotherexamplesofqualitativefrequency distributionsinecologyintheformoftables,orspecieslists,oftheinhabitants ofasampledecologicalarea.Suchtablescatalogtheinhabitantsbyspeciesor atahighertaxonomiclevelandrecordthenumberofspecimensobservedfor each. Thearrangementofsuchtablesisusuallyalphabetical,oritmayfollow aspecial convention,asinsomebotanicalspecieslists. A quantitativefrequencydistributionbasedonmeristicvariatesisshown inTable2.2.Thisisanexamplefromplantecology:thenumberofplantsper quadratsampledislistedattheleftinthevariablecolumn;theobservedfre quencyisshownattheright. Quantitativefrequencydistributionsbasedonacontinuousvariablearc themostcommonlyemployedfrequencydistributions;youshouldbecome thoroughlyfamiliarwiththem.AnexampleisshowninBox2.1.Itisbasedon 25femurlengthsmeasuredinanaphidpopulation.The25readingsareshown
atthetopofBox2.1intheorderinwhichtheywereobtainedasmeasurements. (Theycouldhavebeenarrayedaccordingtotheirmagnitude.)Thedataare nextsetupinafrequencydistribution.Thevariatesincreaseinmagnitudeby unitsteps of0.1.Thefrequencydistributionispreparedbyenteringeachvariate inturnonthescaleandindicatingacountbyaconventionaltallymark.When alloftheitemshaveheentalliedinthecorrespondingclass,thetalliesarecon vertedintonumeralsindicatingfrequenciesinthenextcolumn.Theirsumis indicatedbyI.f. Whathaveweachievedinsummarizingourdata')Theoriginal25variates arcnowrepresentedbyonly15classes.Wefindthatvariates3.6, 3.8,and4.3 havethehighestfrequencies.However,we alsonotethattherearcseveralclasses, suchas3.4 or3.7.thatarcnotrepresentedbyasingleaphid.Thisgivesthe No.ofplallts
perquadrat y o 1 2 3 4 5 6 7 8 Total Observed
fi-equellcy f 181
118
97
54
32
9 5 3 1 500
entirefrequencydistributionadrawn-outandscatteredappearance.Thereason forthisisthatwehaveonly25aphids,toofewtoputintoafrequencydistribu tionwith15classes.Toobtainamorecohesiveandsmooth-lookingdistribu tion,wehavetocondenseourdataintofewerclasses.Thisprocessisknown asgroupin!}0(classesoffrequencydistributions;itisillustratedinBox2.1and describedinthefollowingparagraphs. Weshouldrealizethatgroupingindividualvariatesintoclassesofwider rangeisonlyanextensionofthesameprocessthattookplacewhenweobtained theinitialmeasurement.Thus,aswehaveseeninSection2.3,whenwemeasure anaphidandrecorditsfemurlengthas3.3units,weimplytherebythatthe truemeasurementliesbetween3.25and3.35units,butthatwewereunableto measuretotheseconddecimalplace.Inrecordingthemeasurementinitiallyas 3.3units,weestimatedthatitfellwithinthisrange.Hadweestimatedthatit
exceededthevalue of3.35,forexample,wewouldhavegivenitthenexthigher score,3.4.Therefore,all themeasurementsbetween3.25and3.35wereinfact groupedintotheclassidentifiedbytheclassmark3.3.Ourclassintervalwas 0.1units.Ifwenowwishtomakewiderclassintervals,wearedoingnothing
butextendingtherangewithinwhichmeasurementsarcplacedintooneclass. Reference
toBox2.1willmakethisprocessclear.Wegroupthedatatwice in ordertoimpressuponthereadertheflexibilityoftheprocess.Inthefirst exampleofgrouping,theclassintervalhasbeendoubledinwidth;thatis,it hasbeenmadetoequal0.2units.Ifwestartatthelowerend,theimpliedclass limitswillnowbefrom3.25 to3.45,thelimitsforthenextclassfrom3.45to 3.65,andsoforth.
Ournexttaskistofindtheclassmarks.Thiswasquitesimpleinthefre quency distributionshownattheleftsideofBox2.1,inwhichtheoriginalmea surementswereusedasclassmarks.However,nowweareusingaclassinterval twiceaswideasbefore, andtheclassmarksarecalculatedbytakingthemid pointofthenewclassintervals.Thus,tolindtheclassmarkofthefirstclass, wetakethemidpointbetween3.25and3.45.whichturnsouttobe3.35.We notethattheclassmarkhasonemoredecimalplacethantheoriginalmeasure ments.Weshouldnotnowbeledtobelievethatwehavesuddenlyachieved greaterprecision.Wheneverwedesignateaclassintervalwhoselastsiqnijicant digitiseven(0.2inthiscase),theclassmarkwillcarryonemoredecimalplace thantheoriginalmeasurements.OntherightsideofthetableinBox2.1the dataaregroupedonceagain,usingaclassintervalof0.3.Becauseoftheodd lastsignificantdigit.theclassmarknowshowsasmanydecimalplacesasthe originalvariates,the midpointhetween3.25and3.55heing3.4. Oncetheimpliedclasslimitsandtheclassmarkforthelirstclasshave beencorrectlyfound,the otherscanbcwrittcndownbyinspectionwithout anyspccialcomfJutation.Simplyaddtheclassintervalrepeatedlytoeachof thevalues.Thus,startingwiththelowerlimit3.25.byadding0.2weobtain 3.45, 3.65.3,X5.andsoforth;similarly.fortheclassmarks.weohtain3,35,3.55.
3.75, andsoforth.Itshouldheohviousthatthewidertheclassintervals.the morecomp;letthedatahecomehutalsothelessprecise.However,lookingat • BOX2.1
Preparationoffrequencydistributionandgroupingintofewerclasseswithwiderclassintervals. Twenty-fivefemurlengthsoftheaphidPemphigus.Measurementsareinmmx10- 1•
Originalmeasurements
3.83.64.33.54.3
3.34.33.94.33.8
3.94.43.84.73.6
4.14.4 4.53.6 3.8
4.44.13.64.23.9
N o Groupinginto8classesGroupi1tg.imo$cliJsses
Originalfrequencydistribution0/interval0.2ofil'JterlJaJ()'J ImpliedTallyImpliedClassTallyImpliedClassTally
limitsYmarks / limitsmarkmarks / limitsmarkmarks / 3.25-3.353.3
I 13.25-3.453.35
I 13.25-3.553.4
II 2 3.35-3.45
3.40 3.45-3.553.5
I 13.45-3.653.55J,H15
3.55-3.65
3.6 1111
43.55-3.85
3.65-3.75
3.703.65-3.853.75
1111
4 3.75-3.853.8
1111
4 3.85-3.95
3.9 III 33.85-4.053.95
III 33.85-4.15
3.95-4.054.00
4.05-4.15
4.1 II 24.05-4.254.15
III 3 4.15-4.254.2
I 14.15-4.454.3IJ.tftll8
4.25-4.35
4.35-4.45
4.45-4.55
4.55-4.65
4.65-4.75
'LJ 4.3 4.4 4.5 4.6 4.7 1 o 1 25
4.45-4.65
4.65-4.854.55
4.75 7 1 25
4.45-4.75
25
Source:DatafromR. R.Sakal.
Histogramoftheoriginalfrequencydistributionshownaboveandofthegroupeddistributionwith5classes.Linebelow
abscissashowsclassmarksforthegroupedfrequencydistribution.Shadedbarsrepresentoriginalfrequencydistribution;
hollowbarsrepresentgroupeddistribution. 10 >;8 176
f: ......4 3.33.5 3.73.94.14.34.5 4.7
III1It
3.43.74.04.3 4.6
Y(femurlength,inunitsof0.1rom)
Foradetailedaccountoftheprocessofgrouping,seeSection2.5. • N 22CHAPTER2!DATAINBIOSTATISTICS
2.5/FREQUENCYDISTRIBUTIONS23
Whentheshapeofafrequencydistributionisofparticularinterest,wemay wish10presentthedistributioningraphicformwhendiscussingtheresults. Thisisgenerallydonebymeansoffrequencydiagrams,ofwhichtherearctwo commontypes.Foradistributionofmeristicdataweemployahal'dia!fl"ilIII, , 2.2.21
- JJ , J1,7 494'l64
9() 4964
))) 5 ) ) " 66464
7777U
XXXX <)<)<) I <) IX 10101010
11IIIIII
1212127127
131113U
14141414
I)I)r)I)
161616J16J
17171717
IXIXIXIX0
Tolearnhowtoconstructastem-and-Ieafdisplay,letuslookaheadto Table3.Iinthenextchapter,whichlists15bloodneutrophilcounts.Theun orderedmeasurementsareasfollows:4.9,4.6,5.5,9.1,16.3,12.7,6.4,7.1,2.3, 3.6,18.0,3.7,7.3,4.4,and9.8.Toprepareastem-and-Ieafdisplay,wescanthe
variatesinthesampletodiscoverthelowestandhighestleadingdigitordigits. Next,wewrite
downtheentirerangeofleadingdigitsinunitincrementsto theleftofaverticalline(the"stern"),asshownintheaccompanyingillustration. Wethenputthenextdigitofthefirstvariate(a"leaf")atthatlevelofthestem correspondingtoitsleadingdigit(s).Thefirstobservationinoursampleis4.9. Wethereforeplacea9nexttothe4.Thenextvariateis4.6.Itisenteredby findingthestemlevelfortheleadingdigit4 andrecordinga 6nexttothe9 thatisalreadythere.Similarly,forthethirdvariate,5.5,werecorda5nextto theleadingdigit5.Wecontinueinthiswayuntilall15variateshavebeen entered(as"leaves")insequencealongtheappropriateleadingdigitsofthestem. Thecompletedarrayistheequivalentofafrequencydistributionandhasthe appearanceofahistogramorbardiagram(seetheillustration).Moreover,it permitstheefficient orderingofthevariates.Thus,fromthecompletedarray itbecomesobviousthattheappropriateorderingofthe15variatesis2.3,3.6, 3.7,4.4.4.6,4.9,5.5,6.4,7.1,7.3,9.1.9.8,12.7,16.3,18.0.Themediancaneasily
be readoffthestem-and-Ieafdisplay.Itisclearly6.4.Forverylargesamples, stem-and-Ieafdisplaysmaybecomeawkward.Insuchcasesaconventional frequencydistributionasinBox2.Iwouldbepreferable. Coml'lc/cdarray
(,)'(cl'/5) _C''T'1.1,...,.""\'T'1 .';/<,/,7SIt'I':' thefrequencydistributionofaphidfemurlengthsinBox2.I,wenoticethatthe initialratherchaoticstructureisbeingsimplifiedbygrouping.Whenwegroup thefrequencydistributionintofiveclasseswithaclassintervalof0.3units,it becomesnotablybimodal(thatis,itpossessestwopeaksoffrequencies). In settingupfrequencydistributions,from12to20classesshouldbeestab lished.Thisruleneednotbeslavishlyadheredto,butitshouldbeemployed withsomeofthecommonsensethatcomesfromexperienceinhandlingstatis tical data.Thenumberofclassesdependslargelyonthesizeofthesample studied.Samplesoflessthan40or50shouldrarelybegivenasmanyas12 classes,sincethatwouldprovidetoofewfrequenciesperclass.Ontheother hand,samplesofseveralthousandmayprofitablybegroupedintomorethan 20classes.IftheaphiddataofBox2.1needtobegrouped,theyshouldprobably
notbegroupedintomorethan6classes. Iftheoriginaldataprovideuswithfewerclassesthanwethinkweshould have,thennothingcanbedoneifthevariableismeristic,sincethisisthenature ofthedatainquestion.However,withacontinuousvariableascarcityofclasses wouldindicatethatweprobablyhadnotmadeourmeasurementswithsufficient precision. Ifwehadfollowedtherulesonnumberofsignificantdigitsformea surementsstatedinSection2.3,thiscouldnothavehappened. Wheneverwecomeupwithmorethanthedesirednumberofclasses,group ingshouldbeundertaken.Whenthedataaremeristic,theimpliedlimitsof continuousvariablesaremeaningless.Yetwithmanymeristicvariables,such asabristlenumbervaryingfromalowof13toahighof81,itwouldprobably bewisetogroupthevariatesintoclasses,eachcontainingseveralcounts.This canbestbedonebyusinganoddnumberasaclassintervalsothattheclass markrepresentingthedatawillbeawholeratherthanafractionalnumber. Thus.ifweweretogroupthebristlenumbers13.14,15,and16intooneclass, theclass markwouldhavetobe14.5,ameaninglessvalueintermsofbristle number.Itwouldthereforebebettertouseaclassrangingover3bristlesor 5bristles.givingtheintegralvalue14or15asaclassmark.
Groupingdataintofrequencydistributionswasnecessarywhencompu tationsweredonebypencilandpaper.Nowadayseventhousandsofvariates canbeprocessedefficientlybycomputerwithoutpriorgrouping.However,fre quencydistributionsarestillextremclyusefulasatoolfordataanalysis.This isespeciallytrueinanageinwhichitisalltooeasyforaresearchertoobtain anumericalresultfromacomputerprogramwithouteverreallyexaminingthe dataforoutliersorforotherwaysinwhichthesamplemaynotconformto theassumptionsofthestatisticalmethods. Ratherthanusingtallymarkstosetupafrequencydistribution,aswas doneinBox2.1,wecanemployTukey'sstem-and-lea{display.Thistechnique isanimprovement,sinceitnotonlyresultsinafrequencydistributionofthe variatesofasamplebutalsopermitseasycheckingofthevariatesandordering themintoanarray(neitherofwhichispossiblewithtallymarks).Thistechnique willthereforebeusefulincomputingthemedianofasample(secSection3.3) andincomputingvariousteststhatrequireorderedarraysofthesamplevariates ....C"""f;,"'..."1f\11'1"-\ Birthweight(inoz.)
2.6Thehandlingofdata
Datamustbehandledskillfullyandexpeditiouslysothatstatisticscanbeprac ticedsuccessfully.Readersshouldthereforeacquaintthemselveswiththevar- thevariable(inourcase,thenumberofplantsperquadrat),andtheordinate representsthefrequencies.Theimportantpointaboutsuchadiagramisthat thebarsdonottoucheachother,whichindicatesthatthevariableisnotcon tinuous.Bycontrast,continuousvariables,suchasthefrequencydistribution ofthefemurlengthsofaphidstemmothers,aregraphedasahistogrum.Ina histogramthewidthofeachbaralongtheabscissarepresentsaclassinterval ofthefrequencydistributionandthebarstoucheachothertoshowthatthe actuallimitsoftheclassesarecontiguous.Themidpointofthebarcorresponds totheclassmark.AtthebottomofBox2.1areshownhistogramsofthefrc quencydistributionoftheaphiddata.ungroupedandgrouped.Theheightof eachbarrepresentsthefrequencyofthecorrespondingclass. Toillustratethathistogramsareappropriateapproximationstothecon tinuousdistributionsfoundinnature,wemaytakeahistogramandmakethe classintervals morenarrow,producingmoreclasses.Thehistogramwouldthen clearlyhaveacloserfittoacontinuousdistribution.Wecancontinuethispro cessuntiltheclassintervalsbecomeinfinitesimalinwidth.Atthispointthe histogrambecomesthecontinuousdistributionofthevariable. Occasionallytheclassintervalsofagroupedcontinuousfrequencydistri hutionarcunequal.Forinstance,inafrequencydistrihutionofageswemight havemoredetailonthedilTerentagesofyoungindividualsandlessaccurate identilicationoftheagesofoldindividuals.Insuchcases,theclassintervals I'mthe
olderagegroupswouldbewider,thosefortheyoungeragegroups.nar rower.Inrepresentingsuchdata.thebarsofthehistogramarcdrawnwith dilkrentwidths. Figure2.3
showsanothergraphicalmodeofrepresentationofafrequency distributionofacontinuousvariahle(inthiscase,birthweightininfants).As weshallseclatertheshapesofdistrihutionsseeninsuchfrequencypolygons canrevealmuchaboutthebiologicalsituationsalTectingthegivenvariable. 252.6/THEHANDLINGOFDATA
*I;(lrillhll"lll salcs(II'l'Xl'tnso!lwal"l·.l"llili.Thl'seprogralllSarccOlllpatiblewithWilll!owsXI'allliVista. Inthisbookweignore"pencil-and-paper"short-cutmethodsforcomputa tions,foundinearliertextbooksofstatistics,sinceweassumethatthestudent hasaccesstoacalculatororacomputer.Somestatisticalmethodsarevery easy tousebecausespecialtablesexistthatprovideanswersforstandardsta tisticalproblems;thus,almostnocomputationisinvolved.Anexampleis Finney'stable,a2-by-2
contingencytablecontainingsmallfrequenciesthatis usedfor thetestofindependence(PearsonandHartley,1958,Table38).For smallproblems,Finney'stablecanbeusedinplaceofFisher'smethodoffinding exactprobabilities,which isverytedious.Otherstatisticaltechniquesareso easy tocarryoutthatnomechanicalaidsareneeded.Someareinherently simple,suchasthesigntest(Section10.3).Othermethodsareonlyapproximate butcanoftenservethepurposeadequately;forexample,wemaysometimes substituteaneasy-to-evaluatemedian(definedinSection3.3)forthemean (describedinSections3.1and3.2)whichrequireseomputation. Wecanusemanynewtypesofequipmenttoperformstatisticalcomputa tions-manymorethanweeouldhavewhenIntroductiontoBiostutisticswas firstpublished.Theonce-standardelectricallydrivenmechanicaldeskcalculator haseompletelydisappeared.Manynewelectronicdevices,fromsmallpocket ealculatorstolargerdesk-topcomputers,havereplacedit.Suchdevicesareso diverse thatwewillnottrytosurveythefieldhere.Evenifwedid,therateof advanceinthisareawouldbesorapidthatwhateverwemightsaywouldsoon becomeobsolete. We cannotreallydrawthelinebetweenthemoresophisticatedelectronic calculators. ontheonehand,anddigitalcomputers.Thereisnoabruptincrease incapabilitiesbetweenthemoreversatileprogrammablecalculatorsandthe simpler microcomputers,justasthereisnoneasweprogressfrommicrocom puterstominicomputersandsoonuptothelargecomputersthatoneassociates withthe centralcomputationcenterofalargeuniversityorresearchlaboratory. Allcanperformcomputationsautomaticallyandbecontrolledbyasetof detailedinstructionspreparedbytheuser.Mostofthesedevices,includingpro grammablesmallcalculators,arcadequateforallofthecomputationsdescribed inthisbook.evenforlargesetsofdata. Thematerialinthisbookconsistsorrelativelystandardstatistical computationsthatarcavailableinmanystatisticalprograms.BIOMstat l is a statisticalsoftwarepackagethatincludesmostorthestatisticalmethods coveredinthishook. Theuseofmoderndataprocessingprocedureshasoneinherentdanger. Onecanalltooeasilyeitherfeedinerroneousdataorchooseaninappropriate program.Usersmustselectprogramscarefullytoensurethatthoseprograms performthedesiredcomputations,givenumericallyreliableresults,andarcas freefrom erroraspossible.Whenusingaprogramforthelirsttime,oneshould testitusingdatafromtextbookswithwhichoneisfamiliar.Someprograms CHAPTER2 /DATAINBIOSTATISTICS
FIGURE2.3
Frequencypolygon.Birthweightsof9465
malesinfants.Chinesethird-classpatientsin Singapore,1950
and1951.DatafromMillis andSeng(1954). 175
24
2000
'" C ..c::1.500 .:: '0 1000....
'".n 2 500
'"Z 050
26CHAPTER2 /DATAINBIOSTATISTICS
arenotoriousbecausetheprogrammerhasfailedtoguardagainstexcessive roundingerrorsorotherproblems.Usersofaprogramshouldcarefullycheck the databeinganalyzedsothattypingerrorsarenotpresent.Inaddition,pro gramsshouldhelpusersidentifyandremovebaddatavaluesandshouldprovide themwith transformationssothattheycanmakesurethattheirdatasatisfy the assumptionsofvariousanalyses. Exercises
2.1Roundthefollowingnumberstothreesignificantfigures:106.55,0.06819,3.0495,
7815.01,2.9149.and20.1500.Whataretheimpliedlimitsbeforeandaflerround
ing?Roundthesesamenumberstoonedecimalplace. ANS.Forthefirstvalue:107;106.545-106.555;106.5-107.5;106.6 2.2 Differentiatebetweenthefollowingpairsoftermsandgiveanexampleofeach. (a)Statisticalandbiologicalpopulations.(b)Var