[PDF] STAT 201 Chapter 2 Exploring Data



Previous PDF Next PDF







Variables and data types

(*) Quantitative variables: The characteristic is numerical E g , income level, age, blood pressure Quantitative variables can be discrete or continuous Discrete variables can take values that di er by xed amounts, usually usually used to count things E g , number of children Continuous variables can take values that di er by arbitrarily



STAT 201 Chapter 2 Exploring Data

•Quantitative(Continuous): Days, Age •Quantitative(Discrete): Piercings Days Piercings Gym Type Age Gender 2 0 No Neither 46 Female 3 1 Yes run 21 Female 1 0 Yes run 64 Male 6 2 Yes Both 18 Female 0 0 No Neither 19 Female 9



Intro to R Continued Qualitative vs Quantitative

–variable name –quantitative or qualitative –discrete, continuous, neither –nominal, ordinal, neither • A specific variable can be selected and passed to the class function Pass the variable age of dogData to class What does the result tell us?



Chapter 2 De ning and Classifying Data Variables

12), then this is a quantitative variable Otherwise, it is a categorical variable For example, if the measure is age of the subjects in years, then for all of the pairs 15 vs 20, 27 vs 33, 62 vs 67, etc , the di erence of 5 indicates that the subject in the pair with the large value has lived 5 more years than the subject



Data, variable, attribute - Coordination Toolkit

continuous variable can be transformed into categories and become a discrete variable Continuous variables can be further categorized as either interval or ratio variables Example of continuous variables: Time, weight, height, income, age, distance, quantity of milk produced, cultivated area, etc



II “Determining the value of a variable” means measuring it

ulation is a quantitative variable of cities Number of teeth is a quantitative variable of people Age, height and weight are quantitative variables of people Temperature is a quantitative variable (of place and time) A quantitative variable, such as population or number of teeth, that is always a whole number is called discrete



Notesdecours étape1 section1

Une variable quantitative est qualifiée de « discrète » ou de « continue » selon les valeurs qu’elle peut prendre L’ensemble de nombres auquel ces valeurs appartiennent est appelé « l’ensemble de référence » Une variable discrète est une variable dont on pourrait _____ toutes les valeurs



Statistiques Descriptives Et Probabilités

Lorsque la variable quantitative discrète ou continue comprend un grand nombre de valeurs, il est préférable de regrouper les valeurs en certains intervalles appelés classes pour rendre les données statistiques plus lisibles Considérons une variable X dont les valeurs sont dans un intervalle



FACULTE DE MEDECINE DE CONSTANTINE DEPARTEMENT DE MEDECINE

Répartition de la fréquence d’une variable donnée selon l’âge effectif/age 5 10 10 15 15 20 2- Polygone de fréquence C’est une ligne polygonale, construite à partir de l’histogramme Adapté à la représentation d’une variable quantitative continue mais aussi discontinue

[PDF] agence architecture paris stage PDF Cours,Exercices ,Examens

[PDF] agence d'architecture et d'urbanisme paris PDF Cours,Exercices ,Examens

[PDF] agence de développement de la visite d’entreprise PDF Cours,Exercices ,Examens

[PDF] agence de l'ile PDF Cours,Exercices ,Examens

[PDF] agence nationale des ports PDF Cours,Exercices ,Examens

[PDF] agence nationale des ports recrutement PDF Cours,Exercices ,Examens

[PDF] agence nationale des ports wikipedia PDF Cours,Exercices ,Examens

[PDF] agence navigo PDF Cours,Exercices ,Examens

[PDF] agence pour la création d'entreprise PDF Cours,Exercices ,Examens

[PDF] agence urbanisme paris stage PDF Cours,Exercices ,Examens

[PDF] agent administratif contractuel education nationale PDF Cours,Exercices ,Examens

[PDF] agent d'escale en aéroport formation PDF Cours,Exercices ,Examens

[PDF] agent de socialisation 2nde Economie

[PDF] agent immigration australie en france PDF Cours,Exercices ,Examens

[PDF] Agent réducteur Terminale Physique

STAT 201 Chapter 2

Exploring Data

1

Types of Variables

Variable: any characteristic that is observed for the subject. There are two types of variables, categorical variable and quantitative variable. Categorical: Observations that belong to a set of categories. -Examples: Hair color, gender, zip code, etc. Quantitative: Observations that take on numerical values -Height, Weight, Income 2

Types of Variables

Quantitative: Observations that take on numerical values -Discrete: measured by a whole number

Examples: Number of books, children, money, etc

-Continuous: measured on an interval

Examples: Time, weight, distances

3

How to Compare Discrete and Continuous

If you think of time: going from 1 min to 2 min we have to hit all of the times, e.g. 1.5 min or 1 min 30 sec

If you think of weight: going from 150 lbsto 140 lbswe have to be every weight between 140 and 150, e.g. 144 lbs

If you think of the number of books and children, we jump from one number to the next, 2.5 books, 1.5 children means nothing.

Time and weight are continuous variables. Books and children are discrete variables. 4

How to Compare Discrete and Continuous

The big difference here is that we can keep coming up with smaller units for the continuouscase and we stop at some point from the discretecase. It should be noted that when we talk about continuousvariables, we stop somewhere so we are measuring them discretelyfor convenience. (e.g. 100 mil to Columbia) 5

Data Type: Example

Let's consider a random sample of fiǀe residents of Columbia

Days: Number of days spent on workout weekly

Piercings: Number of body piercings

Gym: Do they go to the gym or not

Type: Do they lift, run, neither or both

Age: Age of person, in years

Gender: Male or Female

6

Data Type: Example

Days: Number of days spent on workout weekly

Piercings: Number of body piercings

Gym: Do they go to the gym or not

Type: Do they lift, run, neither or both

Age: Age of person, in years

Gender: Male or Female

DaysPiercingsGymTypeAgeGender

20NoNeither46Female

31Yesrun21Female

10Yesrun64Male

62YesBoth18Female

00NoNeither19Female

7

Data Type: Example

Which variables are Categorical?

Which variables are Quantitative(Discrete)?

Which variables are Quantitative(Continuous)?

DaysPiercingsGymTypeAgeGender

20NoNeither46Female

31Yesrun21Female

10Yesrun64Male

62YesBoth18Female

00NoNeither19Female

8

Data Type: Example

Categorical: Gym, Type, Gender

Quantitative(Continuous): Days, Age

Quantitative(Discrete):Piercings

DaysPiercingsGymTypeAgeGender

20NoNeither46Female

31Yesrun21Female

10Yesrun64Male

62YesBoth18Female

00NoNeither19Female

9

Categorical Summary: Frequency Table

Let's say we had 160 people in our sample instead of the

5 in the previous example and we want to get a better

look at the type of workout that a resident of Columbia has.

TypeFrequencyRelative Frequency

Lift32

Run64

Both16

Neither48

Total160

10

Categorical Summary: Frequency Table

TypeFrequencyRelative Frequency

Lift3232/160=0.2

Run6464/160=0.4

Both1616/160=0.1

Neither4848/160=0.3

Total160160/160=1

11

Categorical Summary: Frequency Table

I think all of us would rather look at percentages than decimals, right?

Percentage= (Decimal*100)%

TypeFrequencyRelative Frequency

Lift3232/160=0.2 AE20%

Run6464/160=0.4 AE40%

Both1616/160=0.1 AE10%

Neither4848/160=0.3 AE30%

Total160160/160=1AE100%

12

Categorical Summary: Frequency Table

Q: How many people workout with at least1 type?

A: We can just add the frequencies:

32+64+16 = 112 people in our sample

TypeFrequencyRelative Frequency

Lift3232/160=0.2 AE20%

Run6464/160=0.4 AE40%

Both1616/160=0.1 AE10%

Neither4848/160=0.3 AE30%

Total160160/160=1AE100%

13

English -This might be the Hardest Part!

At least x:x or any number greater

At most x:x or any number lesser

Less than x:any number smaller than x

More than x:any number larger than x

Between x and y:we will say any number larger than x and less than y excluding x and y

Between 5 and 10 = 6, 7, 8, 9

14

Categorical Summary: Pie Chart

Useful when there

are a small number of categories 15

Categorical Summary: Bar Graph

Useful when there are

many categories of the variable

Useful to compare

groups 16

Quantitative Summary: Histograms

Good for large data

and for showing the shape of distribution

We will use these a

lot! 17

Histogram v.s. Bar Graph

With bar graph, each column represents a group defined by a categorical variable. With histograms, each column represents a group defined by a quantitative variable. 18

Quantitative Summary: Histogram Shapes

19

Quantitative Summary: Histogram Shapes

20

Quantitative Summary: Dot Plot

Useful for smaller

datasets

Useful for finding

outliers

I don't like these ͞dots"

-histograms are almost always better 21

Quantitative Summary: Stem and Leaf

Retain actual

data values 22

Box Plots

The box is created using the quartiles

The whiskers are created using the fences

The points are the outlying points -if there are any 23

Skewnessin Boxplots

24

Right Skewed w/ Boxplots

25

Bell Shaped w/ Boxplots

26

Left Skewed w/ Boxplots

27

Remember: With graphs, if it's ugly it's

probably not right 28

Much Better!

29

The Greek Letter Sigma in Math

Before the Sigma was famous for

representing organizations on campus it was used in mathematics

This is a mathematical operator just

like ͞н" .

This weird looking E, uses for

summation, tells you to add everything up 30

The Greek Letter Sigma in Math

{1,2,3,4,5,6,7,8,9} = 1+2+3+4+5+6+7+8+9 = 45

This is easy, you could have learned this in

first grade -don't make it harder than it actually is

You can add, I have faith in you

31

Quantitative Summary: Mean

Mean (Average) -The mean is the sum of observations divided by the number of observations

Properties: Sensitive to outliers

X are the variablevalues for our sample

n is the size of the samplen xx 32

Quantitative Summary: Median

Median -the median is the midpoint of the observations when they are ordered from the smallest to largest

Properties: Resistant to outliers

In position .5(n+1) when the data is in ascending order 33

Example: Median

Position = .5*(n+1) = .5(11+1) = 6thposition

Median = 5

X Value112345555610

Position1234567891011

34

Example: Median

Position = .5*(n+1) = .5*(8+1) = 4.5thposition

Median = (1.2 + 1.8)/2 = 1.5

X Value0.20.71.11.21.82.39.819.7

Position12345678

35

Quantitative Summary: Mode

Mode-the mode is the observation that shows up the most in the data set.

Mode doesn't necessary edžist when we meet tie

36

Example: Mode

X = {.2, .7, 1.1, 1.2, 1.8, 2.3, 9.8, 19.7}

There is no mode; all observations are tied with one occurrence

X ={1, 1, 2, 3, 4, 5, 5, 5, 5, 6, 10}

Mode = 5 because 5 is the observation that occurred most. 37

Quantitative Summary: Range

Range -The range is the difference between the maximum and minimum observations Properties: easy to calculate but relies on only two values, which may be outliers

Range= Maximum -Minimum

38

Quantitative Summary: Variance

Variance -the average, squared deviation of each observation from the mean The idea is that it measures the spread of the data about the mean negative and is only zero when all data points are equal 39

Quantitative Summary: Standard Deviation

Standard Deviation -the standard deviation is an adjusted average deǀiation of each obserǀations' distance from the mean The idea is that it measures the spread of the data about the mean Properties: The larger the value the more spread or variability in the data, influenced by outliers and it's always positiǀe. Standard Deviation =•ൌܸܽݎ݅ܽ݊ܿ 40

Let's do an edžample͊

41

Quantitative Summary: Example

X = distance (yards per carry for Marcus Lattimore) = {.2, .7, 1.1, 1.2,

1.8, 2.3, 9.8, 19.7}

What kind of data type is this?

We know that distance or length is a Continuous Quantitative variable but we measure it discretely here by tenths of a yard

What type of graphs would be appropriate?

Dot plot,Box plot, steam and leaf plot, or a histogram 42

Quantitative Summary: Example

Let's try a dot plot!

Our outlier is clear

because it is highlighted and far awaybut the graph is awkward and hard to read 43

Quantitative Summary: Example

Let's try a Stem and Leaf Plot!

Our outlier is clear because it is

highlighted and far away but the graph is awkward and hard, or at least annoying to read 44

Quantitative Summary: Example

Let's try a histogram

This is better, but we still

have some awkwardness with the gaps and the outlier isn't as obǀious here 45

Quantitative Summary: Example

Let's try a box plot

This is really the best

choice

Our outlier is clearly shown

The rest of the graph is

readable and not as awkward 46

Back to the example!

47

Quantitative Summary: Example

Mean: ҧݔൌσ௫

= (.2 + .7 + 1.1 + 1.2 + 1.8 + 2.3 + 9.8 + 19.7)/ 8 = 4.6

Median:

Position= .5(8+1) = 4.5thposition

= (1.2 + 1.8) / 2We take the average of the two = 1.5

Mode: there is no mode

48

Quantitative Summary: Example

After removing the outlier,

Mean:ҧݔൌσ௫

=(.2 + .7 + 1.1 + 1.2 + 1.8 + 2.3 + 9.8)/ 7 = 2.442857

Median:

Position = .5(7+1) = 4

=1.2 49

Quantitative Summary: Example

Before Removing Outlier: Mean = 4.6

Median = 1.5

After Removing Outlier:Mean = 2.442857

Median =1.2

Notice that the mean changes much more than the median. Remember that the median is resistant to outliers and the mean is not.quotesdbs_dbs5.pdfusesText_10