Download Lecture 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Chapter 2
Exploring Data with Graphs and
Numerical Summaries

Learn ….
The Different Types of Data
The Use of Graphs to Describe
Data
The Numerical Methods of
Summarizing Data
Agresti/Franklin Statistics, 1 of 63
Section 2.1
What are the Types of Data?
Agresti/Franklin Statistics, 2 of 63
In Every Statistical Study:
 Questions
are posed
 Characteristics are observed
Agresti/Franklin Statistics, 3 of 63
Characteristics are Variables
A Variable is any characteristic that
is recorded for subjects in the study
Agresti/Franklin Statistics, 4 of 63
Variation in Data

The terminology variable highlights
the fact that data values vary.
Agresti/Franklin Statistics, 5 of 63
Example: Students in a
Statistics Class

Variables:
• Age
• GPA
• Major
• Smoking Status
•…
Agresti/Franklin Statistics, 6 of 63
Data values are called
observations

Each observation can be:
• Quantitative
• Categorical
Agresti/Franklin Statistics, 7 of 63
Categorical Variable

Each observation belongs to one of a set of
categories

Examples:
• Gender (Male or Female)
• Religious Affiliation (Catholic, Jewish, …)
• Place of residence (Apt, Condo, …)
• Belief in Life After Death (Yes or No)
Agresti/Franklin Statistics, 8 of 63
Quantitative Variable

Observations take numerical values

Examples:
• Age
• Number of siblings
• Annual Income
• Number of years of education completed
Agresti/Franklin Statistics, 9 of 63
Graphs and Numerical
Summaries

Describe the main features of a
variable

For Quantitative variables: key
features are center and spread

For Categorical variables: key feature
is the percentage in each of the
categories
Agresti/Franklin Statistics, 10 of 63
Quantitative Variables

Discrete Quantitative Variables
and

Continuous Quantitative Variables
Agresti/Franklin Statistics, 11 of 63
Discrete

A quantitative variable is discrete if its
possible values form a set of separate
numbers such as 0, 1, 2, 3, …
Agresti/Franklin Statistics, 12 of 63
Examples of discrete
variables



Number of pets in a household
Number of children in a family
Number of foreign languages spoken
Agresti/Franklin Statistics, 13 of 63
Continuous

A quantitative variable is continuous
if its possible values form an interval
Agresti/Franklin Statistics, 14 of 63
Examples of Continuous
Variables




Height
Weight
Age
Amount of time it takes to complete
an assignment
Agresti/Franklin Statistics, 15 of 63
Frequency Table

A method of organizing data

Lists all possible values for a variable
along with the number of
observations for each value
Agresti/Franklin Statistics, 16 of 63
Example: Shark Attacks
Agresti/Franklin Statistics, 17 of 63
Example:
Example: Shark
Shark Attacks
Attacks

What is the variable?

Is it categorical or quantitative?

How is the proportion for Florida
calculated?

How is the % for Florida calculated?
Agresti/Franklin Statistics, 18 of 63
Example: Shark Attacks

Insights – what the data tells us about
shark attacks
Agresti/Franklin Statistics, 19 of 63
Identify the following variable as
categorical or quantitative:
Choice of diet
(vegetarian or non-vegetarian):
a.
b.
Categorical
Quantitative
Agresti/Franklin Statistics, 20 of 63
Identify the following variable as
categorical or quantitative:
Number of people you have known who have
been elected to political office:
a.
b.
Categorical
Quantitative
Agresti/Franklin Statistics, 21 of 63
Identify the following variable as
discrete or continuous:
The number of people in line at a box office to
purchase theater tickets:
a.
b.
Continuous
Discrete
Agresti/Franklin Statistics, 22 of 63
Identify the following variable as
discrete or continuous:
The weight of a dog:
a.
Continuous
b.
Discrete
Agresti/Franklin Statistics, 23 of 63
Section 2.2
How Can We Describe Data Using
Graphical Summaries?
Agresti/Franklin Statistics, 24 of 63
Graphs for Categorical Data

Pie Chart: A circle having a “slice of
pie” for each category

Bar Graph: A graph that displays a
vertical bar for each category
Agresti/Franklin Statistics, 25 of 63
Example: Sources of Electricity Use
in the U.S. and Canada
Agresti/Franklin Statistics, 26 of 63
Pie Chart
Agresti/Franklin Statistics, 27 of 63
Bar Chart
Agresti/Franklin Statistics, 28 of 63
Pie Chart vs. Bar Chart


Which graph do you prefer?
Why?
Agresti/Franklin Statistics, 29 of 63
Graphs for Quantitative Data

Dot Plot: shows a dot for each
observation

Stem-and-Leaf Plot: portrays the
individual observations

Histogram: uses bars to portray the
data
Agresti/Franklin Statistics, 30 of 63
Example: Sodium and Sugar
Amounts in Cereals
Agresti/Franklin Statistics, 31 of 63
Dotplot for Sodium in Cereals

Sodium Data:
0 210 260 125 220 290 210 140 220 200 125
170 250 150 170 70 230 200 290 180
Agresti/Franklin Statistics, 32 of 63
Stem-and-Leaf Plot for
Sodium in Cereal
Sodium Data:
0 210
260 125
220 290
210 140
220 200
125 170
250 150
170 70
230 200
290 180
Agresti/Franklin Statistics, 33 of 63
Frequency Table
Sodium Data:
0 210
260 125
220 290
210 140
220 200
125 170
250 150
170 70
230 200
290 180
Agresti/Franklin Statistics, 34 of 63
Histogram for Sodium in Cereals
Agresti/Franklin Statistics, 35 of 63
Which Graph?

Dot-plot and stem-and-leaf plot:

Histogram
• More useful for small data sets
• Data values are retained
• More useful for large data sets
• Most compact display
• More flexibility in defining intervals
Agresti/Franklin Statistics, 36 of 63
Shape of a Distribution

Overall pattern
• Clusters?
• Outliers?
• Symmetric?
• Skewed?
• Unimodal?
• Bimodal?
Agresti/Franklin Statistics, 37 of 63
Symmetric or Skewed ?
Agresti/Franklin Statistics, 38 of 63
Example: Hours of TV Watching
Agresti/Franklin Statistics, 39 of 63
Identify the minimum and maximum
sugar values:
a.
2 and 14
c.
1 and 15
b.
d.
1 and 3
0 and 16
Agresti/Franklin Statistics, 40 of 63
Consider a data set containing IQ
scores for the general public:
What shape would you expect a histogram of
this data set to have?
a.
Symmetric
b.
Skewed to the left
c.
Skewed to the right
d.
Bimodal
Agresti/Franklin Statistics, 41 of 63
Consider a data set of the scores of
students on a very easy exam in which most
score very well but a few score very poorly:
What shape would you expect a histogram of
this data set to have?
a. Symmetric
b. Skewed to the left
c. Skewed to the right
d. Bimodal
Agresti/Franklin Statistics, 42 of 63
Section 2.3
How Can We describe the Center of
Quantitative Data?
Agresti/Franklin Statistics, 43 of 63
Mean

The sum of the observations
divided by the number of
observations
x 

x
n
Agresti/Franklin Statistics, 44 of 63
Median

The midpoint of the observations
when they are ordered from the
smallest to the largest (or from the
largest to the smallest)
Agresti/Franklin Statistics, 45 of 63
Find the mean and median
CO2 Pollution levels in 8 largest nations measured in
metric tons per person:
2.3 1.1 19.7 9.8 1.8 1.2 0.7 0.2
a.
b.
c.
Mean = 4.6
Mean = 4.6
Mean = 1.5
Median = 1.5
Median = 5.8
Median = 4.6
Agresti/Franklin Statistics, 46 of 63
Outlier

An observation that falls well above
or below the overall set of data

The mean can be highly influenced by
an outlier

The median is resistant: not affected
by an outlier
Agresti/Franklin Statistics, 47 of 63
Mode

The value that occurs most
frequently.

The mode is most often used with
categorical data
Agresti/Franklin Statistics, 48 of 63
Section 2.4
How Can We Describe the Spread of
Quantitative Data?
Agresti/Franklin Statistics, 49 of 63
Measuring Spread: Range

Range: difference between the largest
and smallest observations
Agresti/Franklin Statistics, 50 of 63
Measuring Spread: Standard
Deviation

Creates a measure of variation by
summarizing the deviations of each
observation from the mean and
calculating an adjusted average of these
deviations
s
( x  x )2
n 1
Agresti/Franklin Statistics, 51 of 63
Empirical Rule
For bell-shaped data sets:

Approximately 68% of the observations fall
within 1 standard deviation of the mean

Approximately 95% of the observations fall
within 2 standard deviations of the mean

Approximately 100% of the observations fall
within 3 standard deviations of the mean
Agresti/Franklin Statistics, 52 of 63
Parameter and Statistic

A parameter is a numerical summary of
the population

A statistic is a numerical summary of a
sample taken from a population
Agresti/Franklin Statistics, 53 of 63