[PDF] biostatisticspdf




Loading...







[PDF] Medical Statistics at a Glance - cmuanl

Medical Statistics at a Glance is directed at undergraduate medical students, medical researchers, postgraduates in the biomedical

[PDF] Biostatistics - The Carter Center

Biostatistics i PREFACE This lecture note is primarily for Health officer and Medical students who need to understand the principles of data collection,

Biostatistics: At a Glance - ResearchGate

Objectives of this lecture • Statistics Statistical Investigation • Popular terminologies in Statistics • Data Types • Methods of data collection

[PDF] Biostatistics and Epidemiology

This book, through its several editions, has continued to adapt to evolving areas of research in epidemiology and statistics, while maintaining the orig-

[PDF] Medical statistics book

Medical Statistics at a Glance is directed at undergraduate medical students, medical researchers, Fig 7 1 The probability density function, pdf , of x

[PDF] biostatisticspdf

Introduction to Biostatistics / Robert R Sokal and F James Rohlf Dovcr cd We then cast a neccssarily brief glance at its historical

[PDF] Introductory Biostatistics

1 juil 2022 · tion, the general patterns in a set of data, at a single glance sample fxig of size n from the probability density function ( pdf ) f ?x; 

[PDF] Biostatistics

Martin Bland: An Introduction to Medical Statistics 3rd ed Aviva Petrie and Caroline Sabin: Medical Statistics at a Glance Blackwell Science, 2000

[PDF] Biostatistics and Data Science

Learn from supportive, accessible faculty in biostatistics, AT A GLANCE • 18 months • 42 credit hours • Summer matriculation Curriculum*

[PDF] biostatisticspdf 33440_6biostatistics.pdf

INTRODUCTIONTO

BIOSTATISTICS

SECONDEDITION

RobertR.Sakal

andF.JamesRohlf

StateUniversityNewYorkatStonyBrook

DOVERPUBLICATIONS,INC.

Mineola,NewYork

Cop)'right

Copyright((')1969,1973,19RI.19R7byRobertR.SokalandF.JamesRohlf

Allrightsreserved.

Bih/iographim/Note

ThisDoveredition,firstpublishedin2009,isanunabridgedrepublicationof theworkoriginallypublishedin1969byW.H.FreemanandCompany,New York.TheauthorshavepreparedanewPrefaceforthisedition.

Lihrary01'CongressCata/oging-in-Puhlimtio/lData

SokaLRobertR.

IntroductiontoBiostatistics/RobertR.SokalandF.JamesRohlf.

Dovcrcd.

p.cm. Originallypublished:2ndcd.NewYork:W.H.Freeman,1969.

Includes

bibliographicalreferencesandindex.

ISBN-I3:lJ7X-O-4R6-46961-4

ISBN-IO:0-4X6-46961-1

I.Biometry.r.Rohlf,F.James,1936-II.Title

QH323.5.S63.\2009

570.1'5195dcn

200R04R052

ManufacturedintheUnitedStalesofAmerica

DoverPuhlications,Inc.,31East2ndStreet,Mineola,N.Y.11501 toJulieandJanice

Contents

PREFACETOTHEDOVEREDITIONxi

PREFACExiii

1.INTRODUCTION

1.1Somedefinitions

1.2

Thedevelopmentofbiostatistics2

1.3

Thestatisticalframeoj"mind4

2.DATAINBIOSTATISTICS6

2.1Samplesandpopulations7

2.2Variablesinbiostatistics8

2.3Accuracyandprecisionoj"data10

2.4Derivedvariables13

2.5Frequencydistributions14

2.6Thehandlinyofdata24

3.DESCRIPTIVESTATISTICS27

3./Thearithmeticmean28

3.2Othermeans31

3.3Themedian32

3.4Themode33

3.5Theranye34

3.6

Thestandarddeviation36

3.7Samplestatisticsandparameters37

3.SPracticalmethodsjilrcomputinymeanandstandard

deviation 39

3.9Thecoefficientoj"variation43

V1I1 4.

CONTENTS

INTRODUCTIONTOPROBABILITYDISTRIBUTIONS:

THEBINOMIALANDPOISSONDISTRIBUTIONS46

4.1Probability,randomsampling,andhypothesistesting48

4.2Thebinomialdistribution54

4.3ThePoissondistribution63

CONTENTS

9.TWO-WAYANALYSISOFVARIANCE185

9.1Two-wayanovawithreplication186

9.2

Two-wayanova:Significancetesting197

9.3

Two-wayanOl'awithoutreplication199

IX

10.ASSUMPTIONSOFANALYSISOFVARIANCE211

10.1Theassumptionsofanova212

10.2Transformations216

10.3

Nonparametricmethodsinlieuofanova220

5. 6.

THENORMALPROBABILITYDISTRIBUTION74

5.1Frequencydistributionsofcontinuousvariables75

5.2Derivationofthenormaldistribution76

5.3Propertiesofthenormaldistriblltion78

5.4ApplicatiollSofthenormaldistribution82

5.5Departures/romnormality:Graphicmerhods85

ESTIMATIONANDHYPOTHESISTESTING93

6.1Distributionandvarianceofmeans94

6.2Distributionandvarianceoj'otherstatistics101

6.3Introductiontoconfidencelimits103

6.4Student'stdistriblllion106

6.5Confidencelimitsbased0/1sllmplestatistic.5109

6.6

Thechi-squaredistriburion112

6.7Confidencelimitsfurvariances114

6.8Introducrion/(Ihyporhesisresting115

6.9Testsofsimplehypothesesemployinytherdistriburion

6.10Testinythehypothesis11

0: fT2=fT6129 126
11. 12.

REGRESSION230

11.1Introductiontoregression231

11.2

Modelsinregression233

1/.3Thelinearregressioneqllation235

J1.4MorethanonevallieofYforeachvalueofX

11.5

Testsofsiyn!ficanceinreqression250

11.6Theusesofregression257

1/.7Residualsandtransformationsinreyression259

11.8Anonparametrictestforrewession263

CORRELATION267

/2./Correlationandreyression268

12.2Theproduct-momentcorrelationcoefficient270

/2.3Significancetestsincorrelation280 /2.4Applications0/correlation284 /2.5Kendall'scoefficientofrankcorrelation286 243

7.INTRODUCTIONTOANALYSISOFVARIANCE133

7.1Thevariance.\ofsamplesandrheirmeallS134

7.2

TheFdistrihution138

7.3

ThehypothesisII,,:fT;=143

7.4 lIeteroyeneiryIInWn!lsamplemeans143

7.5Parritio/li/l!ltherotalsumofsquaresUlU/deweeso/freedom

7.6ModelIanOfJa154

7.7Modell/anol'a157

150

13.ANALYSISOFFREQUENCIES294

/3./Te.\tsfilryom/nessorfll:Introductio/l

13.2Sinyle-c1assification!loodness

offlltesls /33Testsorindependence:T\\'o-waytables

APPENDIXES314

A/Malhemaricalappendix314

A2Statisricaltables320

295
301
305

8.SINGLE-CLASSIFICATIONANALYSISOFVARIANCE160

BIBLIOGRAPHY349

173
179
8./ 8.2 8.3 8.4 8.5 S.t!

Computatimlllifimrlllias161

Lqual/I162

UIll'I{IWI/l

165

Twowoups168

Comparis""slll/wnl!mea/ls:Plannedcomparisons

Compariso/l.\al/lOnl!means:UIlplannedcompuriso/lS

INIlEX353

PrefacetotheDoverEdition

Wearepleasedandhonoredtoseethere-issueofthesecondeditionofourIntroduc tiontoBiostatisticsbyDoverPublications.Onreviewingthecopy,wefindthere islittle initthatneedschangingforanintroductorytextbookofbiostatisticsforan advancedundergraduateorbeginninggraduatestudent.Thebookfurnishesanintro ductiontomost ofthestatisticaltopicssuchstudentsarelikelytoencounterintheir coursesandreadings inthebiologicalandbiomedicalsciences.

Thereadermaywonderwhatwewouldchange

ifweweretowritethisbookanew.

Because

ofthevastchangesthathavetakenplaceinmodalitiesofcomputationinthe lasttwentyyears,wewoulddeemphasizecomputationalformulasthatweredesigned forpre-computerdeskcalculators(anagebeforespreadsheetsandcomprehensive statisticalcomputerprograms)andrefocusthereader'sattentiontostructuralfor mulasthatnotonlyexplainthenature ofagivenstatistic,butarealsolessproneto roundingerror incalculationsperformedbycomputers.Inthisspirit,wewouldomit theequation(3.8)onpage39anddrawthereaders'attentiontoequation(3.7)instead.

Similarly,wewouldusestructuralformulas

inBoxes3.1and3.2onpages4\and42, respectively;onpage

161andinBox8.1onpages163/164,aswellasinBox12.1

onpages278/279. Secondly,wewouldputmoreemphasisonpermutationtestsandresamplingmethods. Permutationtestsandbootstrapestimatesarenowquitepractical.

Wehavefoundthis

approachtobenotonlyeasierforstudentstounderstandbut inmanycasespreferable tothetraditionalparametricmethodsthatareemphasized inthisbook.

Robert

R.Sokal

F.JamesRohlf

November2008

Preface

Thefavorablereceptionthatthefirsteditionofthisbookreceivedfromteachers andstudentsencouragedustoprepareasecondedition.Inthisrevisededition, weprovideathoroughfoundationinbiologicalstatisticsfortheundergraduate studentwhohasaminimalknowledgeofmathematics.WeintendIntroduction toBiostatistics tobeusedincomprehensivebiostatisticscourses,butitcanalso be adaptedforshortcoursesinmedicalandprofessionalschools;thus,we includeexamplesfromthehealth-relatedsciences.

Wehave

extractedmostofthistextfromthemore-inclusivesecondedition of ourownBiometry.Webelievethattheprovenpedagogicfeaturesofthat book,suchasitsinformalstyle,willbevaluablehere. WehavemodifiedsomeofthefeaturesfromBiometry;forexample,in IntroductiontoBiostatisticsweprovidedetailedoutlinesforstatistical compu tationsbutweplacelessemphasisonthecomputationsthemselves.Why? Studentsinmanyundergraduatecoursesarenotmotivatedtoandhavefew opportunitiestoperformlengthycomputationswithbiologicalresearchma terial;also,such computationscaneasilybemadeonelectroniccalculators andmicrocomputers.Thus,werelyonthecourseinstructortoadvisestudents onthebestcomputationalprocedurestofollow. Wepresentmaterialinasequencethatprogressesfromdescriptivestatistics tofundamental distributionsandthetestingofelementarystatisticalhypotheses; wethenproceedimmediatelytotheanalysisofvarianceandthefamiliarttest xivPREFACE (whichistreatedasaspecialcaseoftheanalysisofvarianceandrelegatedto severalsectionsofthebook).Wedothisdeliberatelyfortworeasons:(I)since today'sbiologistsallneedathoroughfoundationintheanalysisofvariance, studentsshouldbecomeacquaintedwiththesubjectearlyinthecourse;and(2) ifanalysisofvarianceisunderstoodearly,theneedtousethetdistributionis reduced.(Onewouldstill wanttouseitforthesettingofconfidencelimitsand inafewotherspecialsituations.)Allttestscanbecarriedoutdirectlyasanal yses ofvariance.andtheamountofcomputationoftheseanalysesofvariance isgenerallyequivalenttothatofttests. ThislargersecondeditionincludestheKolgorov-Smirnovtwo-sampletest, nonparametricregression,stem-and-Ieafdiagrams,hanginghistograms,andthe

Bonferroni

methodofmultiplecomparisons.Wehaverewrittenthechapteron theanalysisoffrequenciesintermsoftheGstatisticratherthanX 2, becausethe former hasbeenshowntohavemoredesirablestatisticalproperties.Also,be cause oftheavailabilityoflogarithmfunctionsoncalculators,thecomputation oftheGstatisticisnoweasierthanthatoftheearlierchi-squaretest.Thus,we reorientthechaptertoemphasizelog-likelihood-ratiotests.Wehavealsoadded newhomeworkexercises.

WecallspeciaL

double-numberedtables"boxes."Theycanbeusedascon venientguidesfor computationbecausetheyshowthecomputationalmethods forsolvingvarioustypesofbiostatistica!problems.Theyusuallycontainall thestepsnecessary tosolveaproblem--fromtheinitialsetuptothefinalresult. Thus,studentsfamiliarwithmaterialinthebookcanusethemasquicksum maryremindersofatechnique.

Wefoundinteachingthiscourse

thatwewantedstudentstobeableto refertothematerialnowintheseboxes.Wediscoveredthatwecouldnotcover evenhalfasmuch ofoursubjectifwehadtoputthismaterialontheblack boardduringthelecture,andsowemadeupanddistributedbox'?"dndasked studentstorefertothemduringthelecture.Instructorswhouscthisbookmay wishtousctheboxesinasimilarmanner. We emphasizethepracticalapplicationsofstatisticstobiologyinthisbook; thus.wedeliberatelykeepdiscussions ofstatisticaltheorytoaminimum.De rivations aregivenforsomeformulas,buttheseareconsignedtoAppendixAI, wheretheyshouldbestudiedandreworkedbythestudent.Statisticaltables towhichthereadercanreferwhenworkingthroughthemethodsdiscussedin this bookarefoundinAppendixA2. WearegratefultoK.R.Gabriel,R.C.Lewontin.andM.Kabayfortheir extensive commentsonthesecondeditionofBiometryandtoM.D.Morgan, E.Russek-Cohen,andM.Singhforcommentsonanearlydraftofthisbook.

Wealso

appreciatetheworkofoursecretaries,ResaChapeyandCherylDaly, with preparingthemanuscripts,andofDonnaDiGiovanni,PatriciaRohlf,and

BarbaraThomsonwithproofreading.

Robert

R.Sokal

F.

JamcsRohlf

INTRODUCTIONTO

BIOSTATISTICS

CHAPTER1

Introduction

Thischaptersetsthestageforyourstudyofbiostatistics.InSection1.1,we definethefielditself.Wethencastaneccssarilybriefglance atitshistorical devclopmentinSection

1.2.TheninSection1.3weconcludethechapterwith

adiscussion oftheattitudesthatthepersontrainedinstatisticsbringsto biologicalrcsearch.

1.1Somedefinitions

Wcshalldefinehiostatisticsastheapplicationofstatisti("(llmethodstotheso lutionofbiologi("(llprohlems.Thebiologicalproblemsofthisdefinitionarethose arisinginthebasicbiologicalsciencesaswellasinsuchapplied areasasthe health-relatedsciences andtheagriculturalsciences.Biostatisticsisalsocalled biologicalstatisticsorbiometry. Thedefinitionofbiostatisticsleavesussomewhatupintheair-"statistics" hasnotbeendefined.Statisticsisasciencewellknownbynameeventothe layman. Thenumberofdefinitionsyoucanfindforitislimitedonlybythe numberofbooksyouwishtoconsult.Wemightdefinestatisticsinitsmodern 2 CHAPTER1 /INTRODUCTION1.2/THEDEVELOPMENTOFBIOSTATISTICS3 senseasthescientificstudyofnumericaldatabasedonnaturalphenomena.All partsofthisdefinitionareimportantanddeserveemphasis: Scientificstudy:Statisticsmustmeetthecommonlyacceptedcriteriaof validityofscientificevidence.Wemustalwaysbeobjectiveinpresentationand evaluationofdataandadheretothegeneralethicalcodeofscientificmethod ology,orwemayfindthattheoldsayingthat"figuresneverlie,onlystatisticians do"appliestous. Data:Statisticsgenerallydealswithpopulationsorgroupsofindividuals' henceitdealswith quantitiesofinformation,notwithasingledatum.Thus, measurementofasingleanimalortheresponsefromasinglebiochemicaltest willgenerally notbeofinterest. Unlessdataofastudycanbequantifiedinonewayoranother, theyWIllnotbeamenabletostatisticalanalysis.Numericaldatacanbemea surements(thelength orwidthofastructureortheamountofachemicalin a bodyfluid,forexample)orcounts(suchasthenumberofbristlesorteeth). Naturalphenomena:Weusethisterminawidesensetomeannotonlyall thoseeventsin animateandinanimatenaturethattakeplaceoutsidethecontrol ofhumanbeings,butalsothoseevokedbyscientistsandpartlyundertheir control,as inexperiments.Differentbiologistswillconcernthemselveswith differentlevels ofnaturalphenomena;otherkindsofscientists,withyetdifferent ones.Butallwouldagree thatthechirpingofcrickets,thenumberofpeasin apod, andtheageofawomanatmenopausearenaturalphenomena.The heartbeat ofratsinresponsetoadrenalin,themutationrateinmaizeafter irradiation, ortheincidenceormorbidityinpatientstreatedwithvaccine maystillbeconsiderednatural,even thoughscientistshaveinterferedwiththe phenomenonthroughtheirintervention.Theaveragebiologistwouldnotcon siderthe numberofstereosetsboughtbypersonsindifferentstatesinagiven yearto beanaturalphenomenon.Sociologistsorhumanecologists,however, mightsoconsider itanddeemitworthyofstudy.Thequalification"natural phenomena"isincludedinthedefinitionofstatisticsmostlytomakecertain th.atthe phenomenastudiedarenotarbitraryonesthatareentirelyunderthe Willandoftheresearcher,suchasthenumberofanimalsemployedinanexpenment. Theword"statistics"isalsousedinanother,thoughrelated,way.Itcan betheplural ofthenounstatistic,whichreferstoanyoneofmanycomputed orestimatedstatisticalquantities,suchasthemean,thestandarddeviation,or thecorrelationcoetllcient.Eachoneoftheseisastatistic.

1.2Thedevelopmentofbiostatistics

Modernstatisticsappearstohavedevelopedfromtwosourcesasfarbackas theseventeenthcentury.Thefirstsourcewaspoliticalscience;aform ofstatistics developedasaquantitivedescription ofthevariousaspectsoftheaffairsof agovcrnmentorstate(hencetheterm"statistics").Thissubjectalsobecame knownaspoliticalarithmetic.Taxes andinsurancecausedpeopletobecomeinterestedinproblems ofcensuses,longevity,andmortality.Suchconsiderations assumedincreasingimportance,especiallyinEnglandasthe countryprospered duringthedevelopment ofitsempire.JohnGraunt(1620-1674)andWilliam

Petty(1623-1687)wereearlystudents

ofvitalstatistics,andothersfollowedin theirfootsteps. At aboutthesametime,thesecondsourceofmodernstatisticsdeveloped: themathematicaltheory ofprobabilityengenderedbytheinterestingames ofchance amongtheleisureclassesofthetime.Importantcontributionsto thistheorywere madebyBlaisePascal(1623-1662)andPierredeFermat (1601-1665),bothFrenchmen.JacquesBernoulli(1654-1705),aSwiss,laidthe foundationof modernprobabilitytheoryinArsConjectandi.Abrahamde

Moivre(1667-1754),a

FrenchmanlivinginEngland,wasthefirsttocombine

thestatistics ofhisdaywithprobabilitytheoryinworkingoutannuityvalues andtoapproximatetheimportantnormaldistributionthroughtheexpansion ofthebinomial.

Alaterstimulusforthedevelopment

ofstatisticscamefromthescienceof astronomy,inwhichmanyindividualobservationshadtobedigestedintoa coherenttheory.

Manyofthefamousastronomersandmathematiciansofthe

eighteenthcentury,suchasPierreSimonLaplace(1749-1827)in

Franceand

KarlFriedrichGauss(1777-1855)inGermany,wereamongtheleadersinthis field. Thelatter'slastingcontributiontostatisticsisthedevelopmentofthe method ofleastsquares.

Perhapstheearliest

importantfigureinbiostatisticthoughtwasAdolphe

Quetelet(1796-1874),aBelgian

astronomerandmathematician,whoinhis workcombinedthetheory andpracticalmethodsofstatisticsandappliedthem toproblems ofbiology,medicine,andsociology.FrancisGalton(1822-1911), acousin ofCharlesDarwin,hasbeencalledthefatherofbiostatisticsand eugenics.TheinadequacyofDarwin'sgenetictheoriesstimulatedGaltontotry tosolvetheproblemsofheredity.

Galton'smajorcontributiontobiologywas

hisapplication ofstatisticalmethodologytotheanalysisofbiologicalvariation, particularly throughtheanalysisofvariabilityandthroughhisstudyofregres sion andcorrelationinbiologicalmeasurements.Hishopeofunravelingthe laws ofgeneticsthroughtheseprocedureswasinvain.Hestartedwiththemost ditllcultmaterial andwiththewrongassumptions.However,hismethodology hasbecomethefoundationfortheapplication ofstatisticstobiology. KarlPearson(1857-1936),atUniversityCollege,London,becameinter estedintheapplication ofstatisticalmethodstobiology,particularlyinthe demonstrationofnaturalselection.Pearson'sinterestcameaboutthroughthe influenceof W.F.R.Weldon(1860-1906),azoologistatthesameinstitution.

Weldon,incidentally,

iscreditedwithcoiningtheterm"biometry"forthetype ofstudiesheandPearsonpursued.PearsoncontinuedinthetraditionofGalton andlaidthefoundationformuchofdescriptiveandcorrelationalstatistics. The dominantfigureinstatisticsandhiometryinthetwentiethcenturyhas beenRonald A.Fisher(18901962).Hismanycontributionstostatisticaltheory willbecomeobviouseventothecursoryreaderofthishook.

4CHAPTER1 /INTRODUCTION

1.3/THESTATISTICALFRAMEOFMIND

5 Statisticstodayisabroadandextremelyactivefieldwhoseapplications touchalmosteveryscience andeventhehumanities.Newapplicationsforsta tisticsareconstantlybeingfound, andnoonecanpredictfromwhatbranch ofstatisticsnewapplicationstobiologywillbemade.

1.3Thestatisticalframeofmind

Abrief perusalofalmostanybiologicaljournalrevealshowpervasivetheuse ofstatisticshasbecomeinthebiologicalsciences.Whyhastherebeensucha markedincreaseintheuseofstatisticsinbiology?Apparently,becausebiol ogistshavefound thattheinterplayofbiologicalcausalandresponsevariables does notfittheclassicmoldofnineteenth-centuryphysicalscience.Inthat century,biologistssuchasRobertMayer,HermannvonHelmholtz,andothers tried todemonstratethatbiologicalprocesseswerenothingbutphysicochemi calphenomena.Insodoing,theyhelpedcreatetheimpression thattheexperi mentalmethods andnaturalphilosophythathadledtosuchdramaticprogress inthephysicalsciencesshouldbeimitatedfullyinbiology. Manybiologists,eventothisday,haveretainedthetraditionofstrictly mechanistic anddeterministicconceptsofthinking(whilephysicists,interest inglyenough,astheirsciencehasbecomemorerefined,havebegun toresort tostatisticalapproaches).Inbiology,mostphenomenaareaffectedbymany causalfactors,uncontrollableintheirvariationandoftenunidentifiable.Sta tistics isneededtomeasuresuchvariablephenomena,todeterminetheerror ofmeasurement,andtoascertaintherealityofminutebutimportantdifferences. Amisunderstandingoftheseprinciplesandrelationshipshasgivenrise to theattitudeofsomebiologiststhatifdifferencesinducedbyanexperiment,or observedbynature,arenotclearonplaininspection(andthereforeareinneed ofstatisticalanalysis),theyarenotworthinvestigating.Thereare fewlegitimate fieldsofinquiry,however,inwhich,fromthe natureofthephenomenastudied, statisticalinvestigation isunnecessary.

Statisticalthinking

isnotreallydifferentfromordinarydisciplinedscientific thinking, inwhichwetrytoquantifyourobservations.Instatisticsweexpress ourdegreeofbeliefordisbeliefasaprobabilityratherthanasavague,general statement. Forexample,astatementthatindividualsofspeciesAarelarger thanthose ofspeciesBorthatwomensuffermoreoftenfromdiseaseXthan domenisofakindcommonlymadebybiologicalandmedicalscientists.Such statementscanandshouldbemorepreciselyexpressedinquantitativeform.

Inmanywaysthe

humanmindisaremarkablestatisticalmachine,absorb ingmanyfactsfromtheoutsideworld,digestingthese,andregurgitatingthem insimplesummaryform.Fromourexperienceweknowcertaineventstooccur frequently,othersrarely. "Mansmokingcigarette"isafrequentlyobserved event, "Manslippingonbananapeel,"rare.Weknowfromexperiencethat JapaneseareontheaverageshorterthanEnglishmenandthatEgyptiansare ontheaverage darkerthanSwedes.Weassociatethunderwithlightningalmost always, flieswithgarbagecansinthesummerfrequently,butsnowwiththesouthernCaliforniandesertextremelyrarely.

Allsuchknowledgecomestous

asaresultofexperience, bothourownandthatofothers,whichwelearn aboutbydirectcommunicationorthroughreading.Allthesefactshavebeen processedbythatremarkablecomputer,the humanbrain,whichfurnishesan abstract.This abstractisconstantlyunderrevision,andthoughoccasionally faulty andbiased,itisonthewholeastonishinglysound;itisourknowledge ofthemoment.

Althoughstatisticsarose

tosatisfytheneedsofscientificresearch,thedevel- opmentofitsmethodologyin turnaffectedthesciencesinwhichstatisticsis applied.Thus,throughpositivefeedback,statistics,createdtoservetheneeds ofnaturalscience,hasitselfaffectedthecontent andmethodsofthebiological sciences. Tociteanexample:Analysisofvariancehashadatremendouseffect ininfluencingthetypes ofexperimentsresearcherscarryout.Thewholefieldof quantitativegenetics,oneofwhoseproblems istheseparationofenvironmental fromgeneticeffects,depends upontheanalysisofvarianceforitsrealization, andmany oftheconceptsofquantitativegeneticshavebeendirectlybuilt aroundthedesignsinherentintheanalysisofvariance.

2.1/SAMPLESANDPOPULAnONS7

I!

CHAPTER2

DatainBiostatistics

InSection2,Iweexplainthestatisticalmeaningoftheterms"sample"and "population,"whichweshallbeusingthroughoutthisbook.Then,inSection

2.2,we

cometothetypesofobservationsthatweobtainfrombiologicalresearch material;weshallsee howthesecorrespondtothedifferentkindsofvariables uponwhichweperformthevariouscomputationsintherestofthisbook.In

Section2.3

wediscussthedegreeofaccuracynecessaryforrecordingdataand theprocedureforroundingolThgures.Weshallthenbereadytoconsiderin

Section2.4certain

kindsofderiveddatafrequentlyusedinbiologicalscience-- amongthemratiosandindices-andthepeculiarproblemsofaccuracyand distributiontheypresentus.Knowinghowtoarrangedatainfrequencydistri butionsisimportantbecausesucharrangementsgiveanoverallimpressionof thegeneralpatternofthevariationpresentinasampleandalsofacilitatefurther computationalprocedures.Frequencydistributions,aswellasthepresentation ofnumericaldata,arediscussedinSection2.5.InSection2.6webrieflydescribe the computationalhandlingofdata.

2.1Samplesandpopulations

Weshallnowdefineanumberofimportanttermsnecessaryforanunder standingofbiologicaldata.Thedatainbiostatisticsaregenerallybasedon individualobservations.Theyareobservationsormeasurementstakenonthe smallest samplingunit.

Thesesmallestsamplingunitsfrequently,butnotneces

sarily, arealsoindividualsintheordinarybiologicalsense.Ifwemeasureweight in100rats, thentheweightofeachratisanindividualobservation;thehundred ratweightstogetherrepresentthesampleofobservations,definedasacollection ofindividualobservationsselectedbyaspecifiedprocedure.Inthisinstance,one individual observation(anitem)isbasedononeindividualinabiological sense-thatis,onerat.However,ifwehadstudiedweightinasingleratover aperiod oftime,thesampleofindividualobservationswouldbetheweights recorded ononeratatsuccessivetimes.Ifwewishtomeasuretemperature inastudyofantcolonies,whereeachcolonyisabasicsamplingunit,each temperaturereadingforonecolonyisanindividualobservation,andthesample ofobservationsisthetemperaturesforallthecoloniesconsidered.Ifweconsider anestimateoftheDNAcontentofasinglemammalianspermcelltobean individualobservation,thesampleofobservationsmaybetheestimatesofDNA contentofallthespermcellsstudiedinoneindividualmammal. Wehavecarefullyavoidedsofarspecifyingwhatparticularvariablewas beingstudied,becausetheterms"individual observation"and"sampleofob servations"asusedabovedefineonlythestructurebutnotthenatureofthe datainastudy.Theactualpropertymeasuredbytheindividualobservations isthecharacter,orvariahle.Themorecommontermemployedingeneralsta tistics is"variable."However,inbiologytheword"eharacter"isfrequentlyused synonymously.

Morethanonevariablecanbemeasuredoneachsmallest

samplingunit.

Thus,inagroupof25micewemightmeasurethebloodpH

andtheerythrocytecount.Eachmouse(abiologicalindividual)isthesmallest samplingunit, bloodpHandredcellcountwouldbethetwovariablesstudied. thepHreadings andcellcountsareindividualobservations,andtwosamples of25observations(onpHandonerythrocytecount)wouldresult.Orwemight speak ofahil'ariatesampleof25observations.eachreferringtoapHreading pairedwith anerythrocytecount. Nextwedefinepopulation.Thebiologicaldefinitionofthislermiswell known. Itreferstoalltheindividualsofagivenspecies(perhapsofagiven life-historystage orsex)foundinacircumscribedareaatagiventime.In statistics, populationalwaysmeansthetotality0/indil'idualohsenJatiollsahout whichin/ere/In'sare

10hefrlLlde,existillyanywhereintheworldoratlcastu'ithill

adefinitelyspecifiedsamplingarealimitedinspacealldtime.

Ifyoutakefive

menandstudythenumberofIeucocytesintheirperipheralbloodandyou arcpreparedtodrawconclusionsaboutallmenfromthissampleoffive.then the populationfromwhichthesamplehasbeendrawnrepresentstheleucocyte countsofallextantmalesofthespeciesHomosapiens.If.ontheotherhand. yourestrictyllursclf toamorenarrowlyspecifiedsample.suchasfivemale

8CHAPTER2!DATAINBIOSTATISTICS2.2/VARIABLESINB10STATISTlCS9

Chinese,aged20,andyouarerestrictingyourconclusionstothisparticular group,thenthepopulationfromwhichyouaresamplingwillbeleucocyte numbersofallChinesemalesofage20. Acommonmisuseofstatisticalmethodsistofailtodefinethestatistical populationaboutwhichinferencescanbemade.Areportontheanalysisof asamplefromarestrictedpopulationshouldnotimplythattheresultshold ingeneral.Thepopulationinthisstatisticalsenseissometimesreferredtoas theuniverse. Apopulationmayrepresentvariablesofaconcretecollectionofobjectsor creatures,suchasthetaillengthsofallthewhitemiceintheworld,theleucocyte countsofalltheChinesemenintheworldofage20,ortheDNAcontentof allthehamsterspermcellsinexistence:oritmayrepresenttheoutcomesof experiments,suchasalltheheartbeatfrequenciesproducedinguineapigsby injectionsofadrenalin.Incasesofthefirstkindthepopulationisgenerally finite.Althoughinpracticeitwouldbeimpossibletocollect.count,andexamine allhamsterspermcells,allChinesemenofage20,orallwhitemiceintheworld, thesepopulationsareinfactfinite.Certainsmallerpopulations,suchasallthe whoopingcranesinNorthAmericaoralltherecordedcasesofararebuteasily diagnoseddiseaseX.maywellliewithinreachofatotalcensus.Bycontrast, anexperimentcanberepeatedaninfinitenumberoftimes(atleastintheory).

Agiven

experiment.suchastheadministrationofadrenalintoguineapigs. couldberepeatedaslongastheexperimentercouldobtainmaterialandhis orherhealthandpatienceheldout.Thesampleofexperimentsactuallyper formedisasamplefromanintlnitenumberthatcouldbeperformed. Someofthestatisticalmethodstobedevelopedlatermakeadistinction betweensamplingfromfiniteandfrominfinitepopulations.However,though populationsarctheoreticallyfiniteinmostapplicationsinbiology,theyare generallysomuchlargerthansamplesdrawnfromthemthattheycanbecon sidereddefactoinfinite-sizedpopulations.

2.2Variablesinbiostatistics

Eachbiologi<.:aldisciplinehasitsownsetofvariables.whichmayindudecon ventionalmorpholl.lgKalmeasurements;concentrationsof<.:hemicalsinbody Iluids;ratesofcertainbiologi<.:alproccsses;frcquenciesofcertainevents.asin gcndics,epidemiology,andradiationbiology;physicalreadingsofopticalor electronicmachineryusedinbiologicalresearch:andmanymore.

Wehave

alreadyreferredtobiologicalvariablesinageneralway.butwe havenotyetdefinedthem.WeshalldefineaI'ariahleasapropertywith respect towhich illa.\Im/pledifferillsOllieaSn'rtllillahlewar.Iftheproperty docsnotditTerwilhinasampleathandoratleastamonglhesamplesbeing studied,it<.:annotbeofstatisticalinlerL·st.Length,height,weight,numberof teeth.vitaminCcontent,andgenolypcsan:examplesofvariablesinordinary, geneticallyandphenotypicallydiversegroupsoflHganisms.Warm-bloodedness inagroupofm,lI11m,tlsisnot,sincemammalsareallalikeinthisregard, althoughbodytemperatureofindividualmammalswould,ofcourse,bea variable. We candividevariablesasfollows:

Variables

Measurementvariables

Continuousvariables

Discontinuousvariables

Rankedvariables

Attributes

Measurementvariablesarethosemt'(/surements(/ndthatareexpressed numerically. Measurementvariablesareoftwokinds.Thefirstkindconsistsof continuousvariables,whichatleasttheoreticallycanassumeaninfinitenumber ofvaluesbetweenanytwofixedpoints.Forexample,betweenthetwolength measurements1.5and1.6emthereareaninfinitenumberoflengthsthatcould bemeasuredifoneweresoinclinedandhadapreciseenoughmethodof calibration.Anygivenreadingofacontinuousvariable,suchasalengthof

1.57mm,isthereforeanapproximationtotheexactreading,whichinpractice

isunknowable.Manyofthevariablesstudiedinbiologyarecontinuousvari ables. Examplesarelengths,areas,volumes.weights,angles,temperatures. periodsoftime.percentages.concentrations,andrates. ContrastedwithcontinuousvariablesarethediscontilluousIJllriahlt's.also knownasmeristicordiscretevilrilih/t's.Thesearevariablesthathaveonlycer tainfixed numericalvalues.withnointermediatevaluespossibleinbetween. Thusthenumberofsegmentsinacertaininsectappendagemaybe4or5or

6butnever51or4.3.Examplesofdiscontinuousvariahksarcnumbersofa

given structure(suchassegments,bristles.teeth,orglands),numbersofollspring, numbersofcoloniesofmicroorganismsoranimals.ornumbersofplantsina given quadrat. Somevariablescannothemeasuredbutatleastcanbeorderedorranked bytheirmagnitude.Thus.inanexperimentonemightrecordtherankordn ofemergenceoftenpupaewithoutspecifyingtheexacttimeatwhicheachpupa emerged.Insuchcaseswecodethedataasarallkedmriahle,theorderof emergence.Spe<.:ialmethodsfordealingwithsu<.:hvariableshavebeendevel oped.andseveralarcfurnishedinthisbook.Byexpressingavariableasaseries ofranks,suchas1,2.3,4.5.wedonotimplythattheditTeren<.:einmagnitude between,say,ranksIand2isidenticaltoorevenproportionaltnthedif feren<.:ebetweenranks2and3. Variablesthat<.:annotbemeasuredbutmustbeexpressedqualitativelyarc calledaltrihutes,orlIominalI'liriahies.Theseareallproperties.sudlasbla<.:k orwhite.pregnantornotpregnant,deadoralive,maleorfemale.Whensuch attributesarecombinedwilhfrequen<.:ies,theycanbclrcatedstatistically.Of XOmi<.:e,wemay,forinstance.statethatfourwerehlad.twoagouti.andthe

10CHAPTER2 /DATAINBIOSTATISTICS

2.3/ACCURACYANDPRECISIONOFDATA11

restgray.Whenattributesarecombinedwithfrequenciesintotablessuitable forstatisticalanalysis,theyarereferredtoasenumerationdata.Thustheenu merationdataoncolorinmicewouldbearrangedasfollows: Insomecasesattributescanbechangedintomeasurementvariablesifthisis desired. Thuscolorscanbechangedintowavelengthsorcolor-chartvalues. Certainotherattributesthatcanberankedororderedcanbecodedtobe comerankedvariables.Forexample,threeattributesreferringtoastructure as"poorlydeveloped,""welldeveloped,"and"hypertrophied"couldbecoded

I,2,and3.

Atermthathasnotyetbeenexplainedisvariate.Inthisbookweshalluse it asasinglereading,score,orobservationofagivenvariable.Thus,ifwehave measurementsofthelengthofthetailsoffivemice,taillengthwillbeacon tinuousvariable,andeachofthefivereadingsoflengthwillbeavariate.In thistextweidentifyvariablesbycapitalletters,themostcommonsymbolbeing Y.ThusYmaystandfortaillengthofmice.Avariatewillrefertoagiven length measurement;1';isthemeasurementoftaillengthoftheithmouse,and Y 4 isthemeasurementoftaillengthofthefourthmouseinoursample. Color Black

Agouti

Gray

Totalnumberof

mice

Frequency

4 2 74
80
Mostcontinuousvariables,however,areapproximate.Wemeanbythis thattheexactvalueofthesinglemeasurement,thevariate,isunknownand probablyunknowable.Thelastdigitofthemeasurementstatedshouldimply precision;thatis,itshouldindicatethelimitsonthemeasurementscalebetween whichwebelievethetruemeasurementtolie.Thus,alengthmeasurementof

12.3mmimpliesthatthetruelengthofthestructureliessomewherebetween

12.25and12.35mm.Exactlywherebetweentheseimpliedlimitsthereallength

iswedonotknow.Butwherewouldatruemeasurementof12.25fall?Would itnotequallylikelyfallineitherofthetwoclasses12.2and12.3-clearlyan unsatisfactorystateofaffairs?Suchanargumentiscorrect,butwhenwerecord anumberaseither12.2or12.3,weimplythatthedecisionwhethertoputit intothehigherorlowerclasshasalreadybeentaken.Thisdecisionwasnot takenarbitrarily,butpresumablywasbasedonthebestavailablemeasurement. Ifthescaleofmeasurementissoprecisethatavalueof12.25wouldclearly havebeenrecognized,thenthemeasurementshouldhavebeenrecorded originallytofoursignificantfigures.Impliedlimits,therefore,alwayscarryone morefigure beyondthelastsignificantonemeasuredbytheobserver.

Hence,itfollows

thatifwerecordthemeasurementas12.32,weareimplying thatthetruevalueliesbetween12.315and12.325.Unlessthisiswhatwemean, therewouldbenopointinaddingthelastdecimalfiguretoouroriginalmea surements.Ifwedoaddanotherfigure,wemustimplyanincreaseinprecision. Wesee,therefore,thataccuracyandprecisioninnumbersarenotabsolutecon cepts,butarerelative.Assumingthereisnobias,anumberbecomesincreasingly moreaccurateasweareabletowritemoresignificantfiguresforit(increaseits precision). Toillustratethisconceptoftherelativityofaccuracy,considerthe followingthreenumbers:

Impli"d/imits

Wemayimaginethesenumberstoberecordedmeasurementsofthesamestruc ture.Letusassumethatwehadextramundaneknowledgethatthetruelength ofthegivenstructurewas192.758units.Ifthatwereso,thethreemeasurements wouldincreaseinaccuracyfromthetopdown,astheintervalbetweentheir impliedlimitsdecreased.Youwillnotethattheimpliedlimitsofthetopmost measurementarewiderthanthoseoftheonebelowit,whichinturnarewider thanthoseofthethirdmeasurement. Meristicvariates,thoughordinarilyexact,mayberecordedapproximately whenlargenumbersareinvolved.Thuswhencountsarereportedtothenearest thousand,acountof36,000insectsinacubicmeterofsoil,forexample,implies thatthetruenumbervariessomewherefrom35,500to36,500insects. Tohowmanysignificantfiguresshouldwerecordmeasurements?Ifwearray \.''In'lnL-,,,h"F\rI.-1i""rnof1"\"\'--1nn;111111""frc\tYlthpinthp...r(Jf"4..:t

2.3Accuracyandprecisionofdata

"Accuracy"and"precision"areusedsynonymouslyineverydayspeech,butin statisticswedefine themmorerigorously.Accuracyistheclosenessolameasured or computedvallietoitstruelJalue.Precisio/listheclosenessolrepeatedmeasure ments.Abiased butsensitivescalemightyieldinaccuratebutpreciseweight.By chance,aninsensitivescalemightresultinanaccuratereading,whichwould, however,beimprecise,sincearepeatedweighingwouldbeunlikelytoyieldan equallyaccurateweight.Unlessthereisbiasinameasuringinstrument,precision willleadtoaccuracy.Weneedthereforemainlybeconcernedwiththeformer.

Precise

variatesareusually,butnotnecessarily,wholenumbers.Thus,when wecountfoureggsinanest,thereisnodoubtabouttheexactnumberofeggs in thenestifwehavecountedeorrectly;itis4,not3or5,andclearlyitcould notbe4plusorminusafractionalpart.Meristic,ordiscontinuous,variablesare generallymeasuredasexactnumbers.Seemingly,continuousvariablesderived frommeristiconescanundercertainconditionsalsobeexactnumbers.For instance,ratiosbetweenexactnumbersarcthemselvesalsoexact.Ifinacolony ofanimalsthereareIXfemalesand12males,theratiooffemalestomales(a 193
192.8

192.76192.5193.5

192.75192.85

192.755192.765

12CHAPTER2 /DATAINBIOSTATISTICS2.4/DERIVEDVARIABLES13

26.51\227

133.71375133.71

O.OJ7253Il.0372

O.OJ71530.0372

In16211\.000

17.3476317.3

one,aneasyruletorememberisthatthenumberofunitstepsfromthesmallest tothelargestmeasurementinanarrayshouldusuallybebetween30and300. Thus,ifwearemeasuringaseriesofshellstothenearestmillimeterandthe largestis8mmandthesmallestis4mmwide,thereareonlyfourunitsteps betweenthelargestandthesmallestmeasurement.Hence,weshouldmeasure ourshellstoonemoresignificantdecimalplace.Thenthetwoextrememeasure mentsmightbe8.2mmand4.1mm,with41unitstepsbetweenthem(counting thelastsignificantdigitastheunit);thiswouldbeanadequatenumberofunit steps.Thereasonforsucharuleisthatanerrorof1inthelastsignificantdigit ofareadingof4mmwouldconstituteaninadmissibleerrorof25%,butanerror ofIinthelastdigitof4.1islessthan2.5%.Similarly,ifwemeasuredtheheight ofthetallestofaseriesofplantsas173.2cmandthatoftheshortestofthese plantsas26.6em,thedifferencebetweentheselimitswouldcomprise1466unit steps(of0.1cm),whicharefartoomany.Itwouldthereforebeadvisableto recordtheheightstothenearestcentimeter.asfollows:173cmforthetallest and27cmfortheshortest.Thiswouldyield146unitsteps.Usingtherulewe havestatedforthenumberofunitsteps,weshallrecordtwoorthreedigitsfor mostmeasurements. Thelastdigitshouldalwaysbesignificant;thatis,itshouldimplyarange forthetruemeasurementoffromhalfa"unitstep"belowtohalfa"unitstep" abovetherecordedscore,asillustratedearlier.Thisappliestoalldigits,zero included.Zerosshouldthereforenotbewrittenattheendofapproximatenum berstotherightofthedecimalpointunlesstheyaremeanttobesignificant digits.Thus7.80mustimplythelimits7.795to7.805.If7.75to7.85isimplied, themeasurementshouldberecordedas7.8. Whenthenumberofsignificantdigitsistobereduced,wecarryoutthe processofrOll/utin?}ofrnumbers.Therulesforroundingoffareverysimple.A digittoberoundedofTisnotchangedifitisfollowedbyadigitlessthan5.If thedigittoberoundedoffisfollowedbyadigitgreaterthan5orby5followed byothernonzerodigits,itisincreasedby1.WhenthedigittoberoundedofT isfollowedbya5standingaloneora5followedbyzeros,itisunchangedifit isevenbutincreasedbyIifitisodd.Thereasonforthislastruleisthatwhen suehnumbersaresummedinalongseries,weshouldhaveasmanydigits raisedasarebeinglowered,ontheaverage;thesechangesshouldtherefore balanceoul.PracticetheaboverulesbyroundingofTthefollowingnumbersto theindicatednumberofsignificantdigits:

Num"erSiyrli/icarltdi"itsdesired

Mostpocketcalculatorsorlargercomputersroundofftheirdisplaysusing adifferentrule:theyincreasetheprecedingdigitwhenthefollowingdigitisa 5 standingaloneorwithtrailingzeros.However,sincemostofthemachines usableforstatisticsalsoretaineightortensignificantfiguresinternally,the accumulationofroundingerrorsisminimized.Incidentally,iftwocalculators giveanswerswithslightdifferencesinthefinal(leastsignificant)digits,suspect adifferentnumberofsignificantdigitsinmemoryasacauseofthedisagreement.

2.4Derivedvariables

Themajorityofvariablesinbiometricworkareobservationsrecordedasdirect measurementsorcountsofbiologicalmaterialorasreadingsthataretheoutput ofvarioustypesofinstruments.However,thereisanimportantclassofvariables inbiologicalresearchthatwemaycallthederivedorcomputedvariables.These aregenerallybasedontwoormoreindependentlymeasuredvariableswhose relationsareexpressedinacertainway.Wearereferringtoratios,percentages, concentrations,indices,rates,andthelike. Aratioexpressesasasinglevaluetherelationthattwovariableshave,one totheother.Initssimplestform,aratioisexpressedasin64:24,whichmay representthenumberofwild-typeversusmutantindividuals,thenumberof malesversusfemales,acountofparasitizedindividualsversusthosenotpara sitized,andsoon.Theseexamplesimplyratiosbasedoncounts.Aratiobascd onacontinuousvariablemightbesimilarlyexpressedas1.2:1.8,whichmay representtheratioofwidthtolengthinascleriteofaninsectortheratio betweentheconcentrationsoftwomineralscontainedinwaterorsoil.Ratios mayalsobeexpressedasfractions;thus,thetworatiosabove couldbeexpressed asandU.However,forcomputationalpurposesitismoreusefultoexpress theratioasaquotient.Thetworatioscitedwouldthereforebe2.666...and

0.666...,respectively.Thesearepurenumbers,notexpressedinmeasurement

unitsofanykind.Itisthisformforratiosthatweshallconsiderfurther. arealsoatypeofratio.Ratios,percentages,andconcentrations arebasicquantitiesinmuchbiologicalresearch,widelyusedandgenerally familiar. Anindexistheratioofthevalueofonevariahietothevalueofaso-called standard OIlC.Awell-knownexampleofanindexinthissenseisthecephalic indexinphysicalanthropology.Conceivedinthewidesense,anindexcould betheaverageoftwomeasurements-eithersimply,suchast(lengthofA+ lengthofB),orinweightedfashion,suchas:\[(2xlengthofA)+lengthofBj. Ratesareimportantinmanyexperimentalfieldsofbiology.Theamount ofasubstanceliberatedperunitweightorvolumeofbiologicalmaterial,weight gainperunittime,reproductiveratesperunitpopulationsizeandtime(birth rates),anddeathrateswouldfallinthiscategory. Theuseofratiosandpercentagesisdeeplyingrainedinscientificthought. Oftenratiosmaybetheonlymeaningfulwaytointerpretandunderstandcer taintypesofbiologicalproblems.Ifthebiologicalprocessbcinginvestigated

14CHAPTER2 /DATAINBIOSTATISTICS2.5/FREQUENCYDISTRIBUTIONS

15 20

FIGURE2.1

SamplingfromapopulatlB.Asampleof100.C.Asampleof500.D.Asampleof2000. 160
500

130140150

25
2000
A B o 100
c

II.III I,II,I!I,II,

____---I_luI...L!lU'udUIILI.LU.1111lJ.1JJlllll.JwiLLlIwdLLI--l----'I.l.JII.LI.....L_ II o1I1.1li.1III uu

IIIJ..UJ11Wilill.ll.l.l.

60708090100 110120

Birthweight(oz)

10 f1:t 20 f 10 0 70
60
50
40
f 30
operatesontheratioofthevariablesstudied,onemustexaminethisratioto understandtheprocess.Thus,SinnottandHammond(1935)foundthatinheri tance oftheshapesofsquashesofthespeciesCucurbitapepocouldbeinter pretedthroughaformindexbasedonalength-widthratio,butnotthrough theindependentdimensionsofshape.Bysimilarmethodsofinvestigation,we shouldbeabletofindselectionaffectingbodyproportionstoexistintheevolu tion ofalmostanyorganism. Thereareseveraldisadvantagestousingratios.First,theyarerelatively inaccurate.Letus returntotheratio:mentionedaboveandrecallfromthe previoussectionthatameasurementof1.2impliesatruerangeofmeasurement ofthevariablefrom1.15to1.25;similarly,ameasurementof1.8impliesarange from1.75to1.85.Werealize,therefore,thatthetrueratiomayvaryanywhere fromto,orfrom0.622to0.714.Wenoteapossiblemaximalerrorof

4.2%if1.2isanoriginalmeasurement:(1.25-1.2)/1.2;thecorrespondingmaxi

malerrorfortheratiois7.0%:(0.714-0.667)/0.667.Furthermore,thebest estimateofaratioisnotusuallythemidpointbetweenitspossibleranges.Thus, inourexamplethemidpointbetweentheimpliedlimitsis0.668andtheratio basedonUis0.666...;whilethisisonlyaslightdifference,thediscrepancy maybegreaterinotherinstances.

Asecond

disadvantagetoratiosandpercentagesisthattheymaynotbe approximatelynormallydistributed(seeChapter5)asrequiredbymanystatis ticaltests. Thisdifficultycanfrequentlybeovercomebytransformationofthe variable(asdiscussedinChapter10).Athirddisadvantageofratiosisthat inusingthemonelosesinformationabouttherelationshipsbetweenthetwo variablesexceptfortheinformationabouttheratioitself.

2.5Frequencydistributions

Ifwewere

tosampleapopulationofbirthweightsofinfants,wecouldrepresent each sampledmeasurementbyapointalonganaxisdenotingmagnitudeof birthweight.ThisisillustratedinFigure2.1A,forasampleof25birthweights. Ifwe samplerepeatedlyfromthepopulationandobtain100birthweights,we shall probablyhavetoplacesomeofthesepointsontopofotherpointsin ordertoreeordthemallcorrectly(Figure2.1H).Aswecontinuesamplingad ditionalhundredsandthousandsofbirthweights(Figure2.ICand0),the assemblage ofpointswillcontinuetoincreaseinsizebutwillassumeafairly definiteshape. Theoutlineofthemoundofpointsapproximatesthedistribution ofthevariable.Rememberthatacontinuousvariablesuchasbirthweightcan assumeaninfinityofvaluesbetweenanytwopointsontheabscissa.Therefine mentofourmeasurementswilldeterminehowfinethenumberofrecorded divisionsbctweenanytwopointsalongtheaxiswillbe. Thedistributionofavariableisofconsiderablebiologicalinterest.Ifwe findthatthedislributioll isasymmetricalanddrawnoutinonedirection,ittells us thatthereis,perhaps,selectiollthatcausesorganismstofallpreferentially inoneofthetailsofthedistribution,orpossiblythatthescaleofmeasuremenl Theaboveisanexampleofaquantitativefrequencydistribution,sinceYis clearlya measurementvariable.However,arraysandfrequencydistributions neednotbelimitedtosuchvariables.Wecanmakefrequencydistributionsof attributes,calledqualitativefrequencydistributions.Inthese,thevariousclasses arelistedinsomelogicalorarbitraryorder.Forexample,ingeneticswemight haveaqualitativefrequencydistributionasfollows: 16 200
1:;0 100
oL-_(_)-....-'--......-,-; \'um),Profplants'1\\adrat

CHAPTER2 /DATAINBIOSTATISTICS

FIGURE2.2

Bardiagram.FrequencyofthesedgeCarex

ftaccain500quadrats.DatafromTable2.2; orginallyfromArchibald(1950).

2.5/FREQUENCYDISTRIBUTIONS

Variable

y 9 8 7 6 5 4

Frequellcy

f I I 4 3 I 1 17

TAIlU:2.1

Twoqualitativefrequencydistributions.Numhcrofcasesof skincancer(melanoma)distrihutedoverhodyregionsof

4599menand47Xt>women.

Thistellsusthattherearetwoclassesofindividuals,thoseidentifedbytheA phenotype,ofwhich86werefound,andthosecomprisingthehomozygotere cessive aa,ofwhich32wereseeninthesample. Anexampleofamoreextensivequalitativefrequencydistributionisgiven in Table2.1,whichshowsthedistributionofmelanoma(atypeofskincancer) overbodyregionsinmenandwomen.Thistabletellsusthatthetrunkand limbsarethemostfrequentsitesformelanomasandthatthebuccalcavity,the restofthegastrointestinaltract,andthegenitaltractarcrarelyatllictedbythis ()/Jsel'l'ed)i-e4u('IuT

MenWomen

I chosenissuchastobringaboutadistortionofthedistribution.If,inasample ofimmatureinsects,wediscoverthatthemeasurementsarebimodallydistrib uted(withtwopeaks),thiswouldindicatethatthepopulationisdimorphic. Thismeansthatdifferentspeciesorracesmayhavebecomeintermingledin oursample.Orthedimorphismcouldhavearisenfromthepresenceofboth sexesorofdifferentinstars. Thereareseveralcharacteristicshapesoffrequencydistributions.Themost commonisthesymmetricalbellshape(approximatedbythebottomgraphin Figure2.1),whichistheshapeofthenormalfrequencydistributiondiscussed inChapter5.Therearealsoskeweddistributions(drawnoutmoreatonetail thantheother),I.-shapeddistributionsasinFigure2.2,U-shapeddistributions, andothers,allofwhichimpartsignificantinformationahouttherelationships theyrepresent.Weshallhavemoretosayabouttheimplicationsofvarious typesofdistrihutionsinlaterchaptersandsections. After researchershaveobtaineddatainagivenstudy,theymustarrange thedatainaformsuitableforcomputationandinterpretation.Wemayassume thatvariatesarerandomlyorderedinitiallyorareintheorderinwhichthe measurementshavebeentaken.Asimplearrangementwouldbeanarmrof thedatahyorderofmagnitude.Thus.forexample,thevariates7,6,5,7,X,9,

6,7,4,6,7couldbearrayedinorderofdecreasingmagnitudeasfollows:9,X,

7.7, 7, 7,6, 6, 6,5,4.Wheretherean:somevariatesofthesamevalue.suchas

the6\andTsinthislictitillllSexample.atime-savingdevicemightimmediately haveoccurredtoyounamely.tolistafrequencyforeachoftherecurring variates;thus:9,X,7(4x).()(3xI,5,4.Suchashorthandnotatiollisonewayto representaFCII'h'IICI'disll'ihlllioll,whichissimplyanarrangementofthe ofvariateswiththefrequencyofI:achclassindicated.ConventIOnally,atre qUl:ncy distrihutiollISstall:dIIItabularform;forourexampk,thisisdOlleas follows:

Phenotype

.I A-86 aa32

Ana/om;csilt'

Ilcadandncck

TrunkandIimhs

Buccal

cavity

Rcstofgastr'lIntcslinaltracl

GcnitalIrael

Fyc

Totall:ascs

Sourct'.DatafrolllICL'(I

')4') .124.1 X 5 12 3X2

45')')

645
.1645 II 21
')3 371
47X6
18 CHAPTER2 /DATAINBIOSTATISTICS2.5/FREQUENCYDISTRIBUTIONS19

SouI'ce.DatafromArchibald(t950).

TABU:2.2

Ameristicfrequencydistribution.

Numberofplantsofthesedgeearn

.f/accafoundin500quadrats. typeofcancer.Weoftenencounterotherexamplesofqualitativefrequency distributionsinecologyintheformoftables,orspecieslists,oftheinhabitants ofasampledecologicalarea.Suchtablescatalogtheinhabitantsbyspeciesor atahighertaxonomiclevelandrecordthenumberofspecimensobservedfor each. Thearrangementofsuchtablesisusuallyalphabetical,oritmayfollow aspecial convention,asinsomebotanicalspecieslists. A quantitativefrequencydistributionbasedonmeristicvariatesisshown inTable2.2.Thisisanexamplefromplantecology:thenumberofplantsper quadratsampledislistedattheleftinthevariablecolumn;theobservedfre quencyisshownattheright. Quantitativefrequencydistributionsbasedonacontinuousvariablearc themostcommonlyemployedfrequencydistributions;youshouldbecome thoroughlyfamiliarwiththem.AnexampleisshowninBox2.1.Itisbasedon

25femurlengthsmeasuredinanaphidpopulation.The25readingsareshown

atthetopofBox2.1intheorderinwhichtheywereobtainedasmeasurements. (Theycouldhavebeenarrayedaccordingtotheirmagnitude.)Thedataare nextsetupinafrequencydistribution.Thevariatesincreaseinmagnitudeby unitsteps of0.1.Thefrequencydistributionispreparedbyenteringeachvariate inturnonthescaleandindicatingacountbyaconventionaltallymark.When alloftheitemshaveheentalliedinthecorrespondingclass,thetalliesarecon vertedintonumeralsindicatingfrequenciesinthenextcolumn.Theirsumis indicatedbyI.f. Whathaveweachievedinsummarizingourdata')Theoriginal25variates arcnowrepresentedbyonly15classes.Wefindthatvariates3.6, 3.8,and4.3 havethehighestfrequencies.However,we alsonotethattherearcseveralclasses, suchas3.4 or3.7.thatarcnotrepresentedbyasingleaphid.Thisgivesthe

No.ofplallts

perquadrat y o 1 2 3 4 5 6 7 8 Total

Observed

fi-equellcy f 181
118
97
54
32
9 5 3 1 500
entirefrequencydistributionadrawn-outandscatteredappearance.Thereason forthisisthatwehaveonly25aphids,toofewtoputintoafrequencydistribu tionwith15classes.Toobtainamorecohesiveandsmooth-lookingdistribu tion,wehavetocondenseourdataintofewerclasses.Thisprocessisknown asgroupin!}0(classesoffrequencydistributions;itisillustratedinBox2.1and describedinthefollowingparagraphs. Weshouldrealizethatgroupingindividualvariatesintoclassesofwider rangeisonlyanextensionofthesameprocessthattookplacewhenweobtained theinitialmeasurement.Thus,aswehaveseeninSection2.3,whenwemeasure anaphidandrecorditsfemurlengthas3.3units,weimplytherebythatthe truemeasurementliesbetween3.25and3.35units,butthatwewereunableto measuretotheseconddecimalplace.Inrecordingthemeasurementinitiallyas

3.3units,weestimatedthatitfellwithinthisrange.Hadweestimatedthatit

exceededthevalue of3.35,forexample,wewouldhavegivenitthenexthigher score,3.4.Therefore,all themeasurementsbetween3.25and3.35wereinfact groupedintotheclassidentifiedbytheclassmark3.3.Ourclassintervalwas

0.1units.Ifwenowwishtomakewiderclassintervals,wearedoingnothing

butextendingtherangewithinwhichmeasurementsarcplacedintooneclass.

Reference

toBox2.1willmakethisprocessclear.Wegroupthedatatwice in ordertoimpressuponthereadertheflexibilityoftheprocess.Inthefirst exampleofgrouping,theclassintervalhasbeendoubledinwidth;thatis,it hasbeenmadetoequal0.2units.Ifwestartatthelowerend,theimpliedclass limitswillnowbefrom3.25 to3.45,thelimitsforthenextclassfrom3.45to

3.65,andsoforth.

Ournexttaskistofindtheclassmarks.Thiswasquitesimpleinthefre quency distributionshownattheleftsideofBox2.1,inwhichtheoriginalmea surementswereusedasclassmarks.However,nowweareusingaclassinterval twiceaswideasbefore, andtheclassmarksarecalculatedbytakingthemid pointofthenewclassintervals.Thus,tolindtheclassmarkofthefirstclass, wetakethemidpointbetween3.25and3.45.whichturnsouttobe3.35.We notethattheclassmarkhasonemoredecimalplacethantheoriginalmeasure ments.Weshouldnotnowbeledtobelievethatwehavesuddenlyachieved greaterprecision.Wheneverwedesignateaclassintervalwhoselastsiqnijicant digitiseven(0.2inthiscase),theclassmarkwillcarryonemoredecimalplace thantheoriginalmeasurements.OntherightsideofthetableinBox2.1the dataaregroupedonceagain,usingaclassintervalof0.3.Becauseoftheodd lastsignificantdigit.theclassmarknowshowsasmanydecimalplacesasthe originalvariates,the midpointhetween3.25and3.55heing3.4. Oncetheimpliedclasslimitsandtheclassmarkforthelirstclasshave beencorrectlyfound,the otherscanbcwrittcndownbyinspectionwithout anyspccialcomfJutation.Simplyaddtheclassintervalrepeatedlytoeachof thevalues.Thus,startingwiththelowerlimit3.25.byadding0.2weobtain

3.45, 3.65.3,X5.andsoforth;similarly.fortheclassmarks.weohtain3,35,3.55.

3.75, andsoforth.Itshouldheohviousthatthewidertheclassintervals.the morecomp;letthedatahecomehutalsothelessprecise.However,lookingat •

BOX2.1

Preparationoffrequencydistributionandgroupingintofewerclasseswithwiderclassintervals. Twenty-fivefemurlengthsoftheaphidPemphigus.Measurementsareinmmx10-

1•

Originalmeasurements

3.83.64.33.54.3

3.34.33.94.33.8

3.94.43.84.73.6

4.14.4 4.53.6 3.8

4.44.13.64.23.9

N o

Groupinginto8classesGroupi1tg.imo$cliJsses

Originalfrequencydistribution0/interval0.2ofil'JterlJaJ()'J

ImpliedTallyImpliedClassTallyImpliedClassTally

limitsYmarks / limitsmarkmarks / limitsmarkmarks /

3.25-3.353.3

I

13.25-3.453.35

I

13.25-3.553.4

II 2

3.35-3.45

3.40

3.45-3.553.5

I

13.45-3.653.55J,H15

3.55-3.65

3.6 1111

43.55-3.85

3.65-3.75

3.703.65-3.853.75

1111
4

3.75-3.853.8

1111
4

3.85-3.95

3.9 III

33.85-4.053.95

III

33.85-4.15

3.95-4.054.00

4.05-4.15

4.1 II

24.05-4.254.15

III 3

4.15-4.254.2

I

14.15-4.454.3IJ.tftll8

4.25-4.35

4.35-4.45

4.45-4.55

4.55-4.65

4.65-4.75

'LJ 4.3 4.4 4.5 4.6 4.7 1 o 1 25

4.45-4.65

4.65-4.854.55

4.75 7 1 25

4.45-4.75

25

Source:DatafromR. R.Sakal.

Histogramoftheoriginalfrequencydistributionshownaboveandofthegroupeddistributionwith5classes.Linebelow

abscissashowsclassmarksforthegroupedfrequencydistribution.Shadedbarsrepresentoriginalfrequencydistribution;

hollowbarsrepresentgroupeddistribution. 10 >;8 176
f: ......4

3.33.5 3.73.94.14.34.5 4.7

III1It

3.43.74.04.3 4.6

Y(femurlength,inunitsof0.1rom)

Foradetailedaccountoftheprocessofgrouping,seeSection2.5. • N

22CHAPTER2!DATAINBIOSTATISTICS

2.5/FREQUENCYDISTRIBUTIONS23

Whentheshapeofafrequencydistributionisofparticularinterest,wemay wish10presentthedistributioningraphicformwhendiscussingtheresults. Thisisgenerallydonebymeansoffrequencydiagrams,ofwhichtherearctwo commontypes.Foradistributionofmeristicdataweemployahal'dia!fl"ilIII, ,

2.2.21

- JJ , J1,7

494'l64

9() 4964
))) 5 ) ) " 66464
7777U
XXXX <)<)<) I <) IX

10101010

11IIIIII

1212127127

131113U

14141414

I)I)r)I)

161616J16J

17171717

IXIXIXIX0

Tolearnhowtoconstructastem-and-Ieafdisplay,letuslookaheadto Table3.Iinthenextchapter,whichlists15bloodneutrophilcounts.Theun orderedmeasurementsareasfollows:4.9,4.6,5.5,9.1,16.3,12.7,6.4,7.1,2.3,

3.6,18.0,3.7,7.3,4.4,and9.8.Toprepareastem-and-Ieafdisplay,wescanthe

variatesinthesampletodiscoverthelowestandhighestleadingdigitordigits.

Next,wewrite

downtheentirerangeofleadingdigitsinunitincrementsto theleftofaverticalline(the"stern"),asshownintheaccompanyingillustration. Wethenputthenextdigitofthefirstvariate(a"leaf")atthatlevelofthestem correspondingtoitsleadingdigit(s).Thefirstobservationinoursampleis4.9. Wethereforeplacea9nexttothe4.Thenextvariateis4.6.Itisenteredby findingthestemlevelfortheleadingdigit4 andrecordinga 6nexttothe9 thatisalreadythere.Similarly,forthethirdvariate,5.5,werecorda5nextto theleadingdigit5.Wecontinueinthiswayuntilall15variateshavebeen entered(as"leaves")insequencealongtheappropriateleadingdigitsofthestem. Thecompletedarrayistheequivalentofafrequencydistributionandhasthe appearanceofahistogramorbardiagram(seetheillustration).Moreover,it permitstheefficient orderingofthevariates.Thus,fromthecompletedarray itbecomesobviousthattheappropriateorderingofthe15variatesis2.3,3.6,

3.7,4.4.4.6,4.9,5.5,6.4,7.1,7.3,9.1.9.8,12.7,16.3,18.0.Themediancaneasily

be readoffthestem-and-Ieafdisplay.Itisclearly6.4.Forverylargesamples, stem-and-Ieafdisplaysmaybecomeawkward.Insuchcasesaconventional frequencydistributionasinBox2.Iwouldbepreferable.

Coml'lc/cdarray

(,)'(cl'/5) _C''T'1.1,...,.""\'T'1 .';/<,/,7SIt'I':' thefrequencydistributionofaphidfemurlengthsinBox2.I,wenoticethatthe initialratherchaoticstructureisbeingsimplifiedbygrouping.Whenwegroup thefrequencydistributionintofiveclasseswithaclassintervalof0.3units,it becomesnotablybimodal(thatis,itpossessestwopeaksoffrequencies). In settingupfrequencydistributions,from12to20classesshouldbeestab lished.Thisruleneednotbeslavishlyadheredto,butitshouldbeemployed withsomeofthecommonsensethatcomesfromexperienceinhandlingstatis tical data.Thenumberofclassesdependslargelyonthesizeofthesample studied.Samplesoflessthan40or50shouldrarelybegivenasmanyas12 classes,sincethatwouldprovidetoofewfrequenciesperclass.Ontheother hand,samplesofseveralthousandmayprofitablybegroupedintomorethan

20classes.IftheaphiddataofBox2.1needtobegrouped,theyshouldprobably

notbegroupedintomorethan6classes. Iftheoriginaldataprovideuswithfewerclassesthanwethinkweshould have,thennothingcanbedoneifthevariableismeristic,sincethisisthenature ofthedatainquestion.However,withacontinuousvariableascarcityofclasses wouldindicatethatweprobablyhadnotmadeourmeasurementswithsufficient precision. Ifwehadfollowedtherulesonnumberofsignificantdigitsformea surementsstatedinSection2.3,thiscouldnothavehappened. Wheneverwecomeupwithmorethanthedesirednumberofclasses,group ingshouldbeundertaken.Whenthedataaremeristic,theimpliedlimitsof continuousvariablesaremeaningless.Yetwithmanymeristicvariables,such asabristlenumbervaryingfromalowof13toahighof81,itwouldprobably bewisetogroupthevariatesintoclasses,eachcontainingseveralcounts.This canbestbedonebyusinganoddnumberasaclassintervalsothattheclass markrepresentingthedatawillbeawholeratherthanafractionalnumber. Thus.ifweweretogroupthebristlenumbers13.14,15,and16intooneclass, theclass markwouldhavetobe14.5,ameaninglessvalueintermsofbristle number.Itwouldthereforebebettertouseaclassrangingover3bristlesor

5bristles.givingtheintegralvalue14or15asaclassmark.

Groupingdataintofrequencydistributionswasnecessarywhencompu tationsweredonebypencilandpaper.Nowadayseventhousandsofvariates canbeprocessedefficientlybycomputerwithoutpriorgrouping.However,fre quencydistributionsarestillextremclyusefulasatoolfordataanalysis.This isespeciallytrueinanageinwhichitisalltooeasyforaresearchertoobtain anumericalresultfromacomputerprogramwithouteverreallyexaminingthe dataforoutliersorforotherwaysinwhichthesamplemaynotconformto theassumptionsofthestatisticalmethods. Ratherthanusingtallymarkstosetupafrequencydistribution,aswas doneinBox2.1,wecanemployTukey'sstem-and-lea{display.Thistechnique isanimprovement,sinceitnotonlyresultsinafrequencydistributionofthe variatesofasamplebutalsopermitseasycheckingofthevariatesandordering themintoanarray(neitherofwhichispossiblewithtallymarks).Thistechnique willthereforebeusefulincomputingthemedianofasample(secSection3.3) andincomputingvariousteststhatrequireorderedarraysofthesamplevariates ....C"""f;,"'..."1f\11'1"-\

Birthweight(inoz.)

2.6Thehandlingofdata

Datamustbehandledskillfullyandexpeditiouslysothatstatisticscanbeprac ticedsuccessfully.Readersshouldthereforeacquaintthemselveswiththevar- thevariable(inourcase,thenumberofplantsperquadrat),andtheordinate representsthefrequencies.Theimportantpointaboutsuchadiagramisthat thebarsdonottoucheachother,whichindicatesthatthevariableisnotcon tinuous.Bycontrast,continuousvariables,suchasthefrequencydistribution ofthefemurlengthsofaphidstemmothers,aregraphedasahistogrum.Ina histogramthewidthofeachbaralongtheabscissarepresentsaclassinterval ofthefrequencydistributionandthebarstoucheachothertoshowthatthe actuallimitsoftheclassesarecontiguous.Themidpointofthebarcorresponds totheclassmark.AtthebottomofBox2.1areshownhistogramsofthefrc quencydistributionoftheaphiddata.ungroupedandgrouped.Theheightof eachbarrepresentsthefrequencyofthecorrespondingclass. Toillustratethathistogramsareappropriateapproximationstothecon tinuousdistributionsfoundinnature,wemaytakeahistogramandmakethe classintervals morenarrow,producingmoreclasses.Thehistogramwouldthen clearlyhaveacloserfittoacontinuousdistribution.Wecancontinuethispro cessuntiltheclassintervalsbecomeinfinitesimalinwidth.Atthispointthe histogrambecomesthecontinuousdistributionofthevariable. Occasionallytheclassintervalsofagroupedcontinuousfrequencydistri hutionarcunequal.Forinstance,inafrequencydistrihutionofageswemight havemoredetailonthedilTerentagesofyoungindividualsandlessaccurate identilicationoftheagesofoldindividuals.Insuchcases,theclassintervals

I'mthe

olderagegroupswouldbewider,thosefortheyoungeragegroups.nar rower.Inrepresentingsuchdata.thebarsofthehistogramarcdrawnwith dilkrentwidths.

Figure2.3

showsanothergraphicalmodeofrepresentationofafrequency distributionofacontinuousvariahle(inthiscase,birthweightininfants).As weshallseclatertheshapesofdistrihutionsseeninsuchfrequencypolygons canrevealmuchaboutthebiologicalsituationsalTectingthegivenvariable.

252.6/THEHANDLINGOFDATA

*I;(lrillhll"lll salcs(II'l'Xl'tnso!lwal"l·.l"llili.Thl'seprogralllSarccOlllpatiblewithWilll!owsXI'allliVista. Inthisbookweignore"pencil-and-paper"short-cutmethodsforcomputa tions,foundinearliertextbooksofstatistics,sinceweassumethatthestudent hasaccesstoacalculatororacomputer.Somestatisticalmethodsarevery easy tousebecausespecialtablesexistthatprovideanswersforstandardsta tisticalproblems;thus,almostnocomputationisinvolved.Anexampleis

Finney'stable,a2-by-2

contingencytablecontainingsmallfrequenciesthatis usedfor thetestofindependence(PearsonandHartley,1958,Table38).For smallproblems,Finney'stablecanbeusedinplaceofFisher'smethodoffinding exactprobabilities,which isverytedious.Otherstatisticaltechniquesareso easy tocarryoutthatnomechanicalaidsareneeded.Someareinherently simple,suchasthesigntest(Section10.3).Othermethodsareonlyapproximate butcanoftenservethepurposeadequately;forexample,wemaysometimes substituteaneasy-to-evaluatemedian(definedinSection3.3)forthemean (describedinSections3.1and3.2)whichrequireseomputation. Wecanusemanynewtypesofequipmenttoperformstatisticalcomputa tions-manymorethanweeouldhavewhenIntroductiontoBiostutisticswas firstpublished.Theonce-standardelectricallydrivenmechanicaldeskcalculator haseompletelydisappeared.Manynewelectronicdevices,fromsmallpocket ealculatorstolargerdesk-topcomputers,havereplacedit.Suchdevicesareso diverse thatwewillnottrytosurveythefieldhere.Evenifwedid,therateof advanceinthisareawouldbesorapidthatwhateverwemightsaywouldsoon becomeobsolete. We cannotreallydrawthelinebetweenthemoresophisticatedelectronic calculators. ontheonehand,anddigitalcomputers.Thereisnoabruptincrease incapabilitiesbetweenthemoreversatileprogrammablecalculatorsandthe simpler microcomputers,justasthereisnoneasweprogressfrommicrocom puterstominicomputersandsoonuptothelargecomputersthatoneassociates withthe centralcomputationcenterofalargeuniversityorresearchlaboratory. Allcanperformcomputationsautomaticallyandbecontrolledbyasetof detailedinstructionspreparedbytheuser.Mostofthesedevices,includingpro grammablesmallcalculators,arcadequateforallofthecomputationsdescribed inthisbook.evenforlargesetsofdata. Thematerialinthisbookconsistsorrelativelystandardstatistical computationsthatarcavailableinmanystatisticalprograms.BIOMstat l is a statisticalsoftwarepackagethatincludesmostorthestatisticalmethods coveredinthishook. Theuseofmoderndataprocessingprocedureshasoneinherentdanger. Onecanalltooeasilyeitherfeedinerroneousdataorchooseaninappropriate program.Usersmustselectprogramscarefullytoensurethatthoseprograms performthedesiredcomputations,givenumericallyreliableresults,andarcas freefrom erroraspossible.Whenusingaprogramforthelirsttime,oneshould testitusingdatafromtextbookswithwhichoneisfamiliar.Someprograms

CHAPTER2 /DATAINBIOSTATISTICS

FIGURE2.3

Frequencypolygon.Birthweightsof9465

malesinfants.Chinesethird-classpatientsin

Singapore,1950

and1951.DatafromMillis andSeng(1954). 175
24
2000
'" C ..c::1.500 .:: '0

1000....

'".n 2 500
'"Z 050

26CHAPTER2 /DATAINBIOSTATISTICS

arenotoriousbecausetheprogrammerhasfailedtoguardagainstexcessive roundingerrorsorotherproblems.Usersofaprogramshouldcarefullycheck the databeinganalyzedsothattypingerrorsarenotpresent.Inaddition,pro gramsshouldhelpusersidentifyandremovebaddatavaluesandshouldprovide themwith transformationssothattheycanmakesurethattheirdatasatisfy the assumptionsofvariousanalyses.

Exercises

2.1Roundthefollowingnumberstothreesignificantfigures:106.55,0.06819,3.0495,

7815.01,2.9149.and20.1500.Whataretheimpliedlimitsbeforeandaflerround

ing?Roundthesesamenumberstoonedecimalplace. ANS.Forthefirstvalue:107;106.545-106.555;106.5-107.5;106.6 2.2 Differentiatebetweenthefollowingpairsoftermsandgiveanexampleofeach. (a)Statisticalandbiologicalpopulations.(b)Var
Politique de confidentialité -Privacy policy