Download Mean

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Categorical variable wikipedia , lookup

Time series wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Lesson 1 - 2
Describing Distributions
with Numbers
parts from Mr. Molesky’s Statmonkey website
Knowledge Objectives
• What is meant by a resistant measure?
• Two reasons why we use squared deviations rather
just average deviations from the mean
• What is meant by degrees of freedom”
Construction Objectives
• Identify situations in which the mean is the most
appropriate measure of center and situations in
which the median is the most appropriate measure
• Given a data set:
–
–
–
–
–
–
Find the quartiles
Find the five-number summary
Compute the mean and median as measures of center
Compute the interquartile range (IQR)
Use the 1.5IQR rule to identify outliers
Compute the standard deviation and variance as measures
of spread
Construction Objectives cont
• Identify situations in which the standard deviation is
the most appropriate measure of spread and
situations in which the interquartile range is the
most appropriate measure
• Explain the effect of a linear transformation of a data
set on the mean, median, and standard deviation of
the set
• Use numerical and graphical techniques to compare
two or more data sets
Vocabulary
• Mean – the average value
• Median – the middle value (in an ordered list)
• Resistant measure – a measure (statistic or parameter) that is
not sensitive to the influence of extreme observations
• Mode – the most frequent data value
• Range – difference between the largest and smallest
observations
• Pth percentile – p percent of the observations(in an ordered list)
fall below at or below this number
• Quartile – multiples of 25th percentile (Q1 – 25th ; Q2 –50th or
median; Q3 – 75th)
• Five number summary – the minimum, Q1, Median, Q3,
maximum
Vocabulary cont
• Boxplot – graphs the five number summary and any outliers
• Interquartile range (IQR) – where IQR = Q3 – Q1
• Outlier – a data value that lies outside the interval [Q1 –
1.5IQR, Q3 + 1.5IQR]
• Variance – the average of the squares of the deviations from
the mean
• Standard Deviation – the square toot of the variance
• Degrees of freedom – the number of independent pieces of
information that are included in your measurement
• Linear transformation – changes the data in the form of xnew = a
+ bx

Measures of Center
Numerical descriptions of distributions begin with a
measure of its “center”
If you could summarize the data with one number,
what would it be?
x
Mean: The “average” value of a dataset
x1  x2  ... xn
x
n
x

x
i
n
Median: The “middle” value of an ordered dataset
Arrange observations in order min to max

Locate the middle observation,
average if needed.
Mean vs Median
The mean and the median are the most common
measures of center
If a distribution is perfectly symmetric,
the mean and the median are the same
The mean is not resistant to outliers
The mode, the data value that occurs the most often,
is a common measure of center for categorical data
You must decide which number is the most
appropriate description of the center...
Mean Median Applet
http://bcs.whfreeman.com/tps3e/content/cat_020/applets/
meanmedian.html
Use the mean on symmetric data and
the median on skewed data or data with outliers
Distributions Parameters
Median
Mean
Mode
Mean < Median < Mode
Skewed Left: (tail to the left)
Mean substantially smaller than median
(tail pulls mean toward it)
Distributions Parameters
Mode
Median
Mean
Mean ≈ Median ≈ Mode
Symmetric:
Mean roughly equal to median
Distributions Parameters
Median
Mode
Mean
Mean > Median > Mode
Skewed Right: (tail to the right)
Mean substantially greater than median
(tail pulls mean toward it)
Central Measures Comparisons
Measure of
Central Tendency
Computation
Interpretation
Mean
μ = (∑xi ) / N
x‾ = (∑xi) / n
Center of gravity
Median
Arrange data in
ascending order
and divide the data
set into half
Divides into
bottom 50% and
top 50%
Mode
Tally data to
determine most
frequent
observation
Most frequent
observation
When to use
Data are
quantitative and
frequency
distribution is
roughly symmetric
Data are
quantitative and
frequency
distribution is
skewed
Data are
categorical or the
most frequent
observation is the
desired measure of
central tendency
Example 1
Which of the following measures of central
tendency resistant?
1. Mean
Not resistant
2. Median
Resistant
3. Mode
Resistant
Example 2
Given the following set of data:
70,
28,
56,
63,
56,
35,
51,
50,
48,
58,
46,
46,
48,
62,
39,
69,
53,
45,
56,
53,
52,
60,
32,
70,
66,
38,
44,
33,
48,
73,
60,
54,
What is the mean?
51.125
What is the median?
51
What is the mode?
48, 51, 56
36,
45,
51,
55,
49,
51,
44,
52
What is the shape of the distribution?
Symmetric
(tri-modal)
Example 3
Given the following types of data and sample sizes, list
the measure of central tendency you would use and
explain why?
Sample of 50
Hair color
Height
Weight
Parent’s Income
Number of Siblings
Age
mode
mean
mean
median
mean
mean
Sample of 200
mode
mean
mean
median
mean
mean
Does sample size affect your decision?
Not in this case, but the larger the sample size,
might allow use to use the mean vs the median
Sample Data
Consider the following test scores for a small class:
75
76
82
93
45
68
74
82
91
Plot the data and describe the SOCS:
Shape?
Outliers?
Center?
Spread?
What number best describes the “center”?
What number best describes the “spread’?
98
Day 1 Summary and Homework
• Summary
– Three characteristics must be used to describe
distributions (from histograms or similar charts)
• Shape (uniform, symmetric, bi-modal, etc)
• Center (mean, median, mode measures)
• Spread (variance – next lesson)
–
–
–
–
Median is resistant to outliers; mean is not!
Use Mean for symmetric data
Use Median for skewed data (or data with outliers)
Use Mode for categorical data
• Homework
– pg 74 – 75: problems 27-31