Download U3 Introduction to Summary Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Introduction to Summary Statistics
Statistics
• The collection, evaluation, and interpretation of
data
• Statistical analysis of measurements can help
verify the quality of a design or process
Summary Statistics
Central Tendency
• “Center” of a distribution
– Mean, median, mode
Variation
• Spread of values around the center
– Range, standard deviation, interquartile range
Distribution
• Summary of the frequency of values
– Frequency tables, histograms, normal distribution
Mean
Central Tendency
• The mean is the sum of the values of a set
of data divided by the number of values in
that data set.
xi
μ=
N
Mean
Central Tendency
xi
μ=
N
μ = mean value
xi = individual data value
xi = summation of all data values
N = # of data values in the data set
Mean
Central Tendency
• Data Set
3 7 12 17 21 21 23 27 32 36 44
• Sum of the values = 243
• Number of values = 11
xi
Mean = μ =
N
243
=
= 22.09
11
Mode
Central Tendency
• Measure of central tendency
• The most frequently occurring value in a
set of data is the mode
• Symbol is M
Data Set:
27 17 12 7 21 44 23 3 36 32 21
Mode
Central Tendency
• The most frequently occurring value in a
set of data is the mode
Data Set:
3 7 12 17 21 21 23 27 32 36 44
Mode = M = 21
Mode
Central Tendency
• The most frequently occurring value in a
set of data is the mode.
• Bimodal Data Set: Two numbers of equal
frequency stand out
• Multimodal Data Set: If more than two
numbers of equal frequency stand out
Mode
Central Tendency
Determine the mode of
48, 63, 62, 49, 58, 2, 63, 5, 60, 59, 55
Mode = 63
Determine the mode of
48, 63, 62, 59, 58, 2, 63, 5, 60, 59, 55
Mode = 63 & 59 Bimodal
Determine the mode of
48, 63, 62, 59, 48, 2, 63, 5, 60, 59, 55
Mode = 63, 59, & 48
Multimodal
Median
Central Tendency
• Measure of central tendency
• The median is the value that occurs in the
middle of a set of data that has been
arranged in numerical order
• Symbol is ~x, pronounced “x-tilde”
Median
Central Tendency
• The median is the value that occurs in the
middle of a set of data that has been
arranged in numerical order.
Data Set:
27 17 12 7 21 44 23 3 36 32 21
Median
Central Tendency
• A data set that contains an odd number of
values always has a Median.
Data Set:
3 7 12 17 21 21 23 27 32 36 44
Median
Central Tendency
• For a data set that contains an even number of
values, the two middle values are averaged with
the result being the Median.
Data Set:
3 7 12 17 21 21 23 27 31 32 36 44
Range
Variation
• Measure of data variation.
• The range is the difference between the
largest and smallest values that occur in a
set of data.
• Symbol is R
Data Set:
3 7 12 17 21 21 23 27 32 36 44
Range = R = 44 – 3 = 41
Standard Deviation
Variation
• Measure of data variation.
• The standard deviation is a measure of
the spread of data values.
– A larger standard deviation indicates a wider
spread in data values
Standard Deviation
σ=
Variation
xi − μ
N
2
σ = standard deviation
xi = individual data value ( x1, x2, x3, …)
μ = mean
N = size of population
Standard Deviation
Variation
2
Procedure:
xi − μ
σ=
N
1. Calculate the mean, μ.
2. Subtract the mean from each value and
then square each difference.
3. Sum all squared differences.
4. Divide the summation by the size of the
population (number of data values), N.
5. Calculate the square root of the result.
Standard Deviation
Calculate the standard
deviation for the data array
σ=
xi − μ
N
2
2, 5, 48, 49, 55, 58, 59, 60, 62, 63, 63
xi
524
1. Calculate the mean.

 47.64
μ=
11
N
2. Subtract the mean from each data value and square each
2
difference.
xi − μ
(2 - 47.64)2 = 2083.01
(5 - 47.64)2 = 1818.17
(48 - 47.64)2 =
0.13
(49 - 47.64)2 =
1.85
(55 - 47.64)2 = 54.17
(58 - 47.64)2 = 107.33
(59 - 47.64)2 =
(60 - 47.64)2 =
(62 - 47.64)2 =
(63 - 47.64)2 =
(63 - 47.64)2 =
129.05
152.77
206.21
235.93
235.93
Standard Deviation
Variation
3. Sum all squared differences.
2
2083.01 + 1818.17 + 0.13 + 1.85 + 54.17 +
xi − μ =
107.33 + 129.05 + 152.77 + 206.21 +
235.93 + 235.93
= 5,024.55
4. Divide the summation by the number of data values.
2
xi − μ
5024.55
=
= 456.78
N
11
5. Calculate the square root of the result.
xi − μ
N
2
= 456.78 = 21.4
A Note about Standard Deviation
• Two distinct calculations
– Population Standard Deviation
• The measure of the spread of data within a
population.
• Used when you have a data value for every
member of the entire population of interest.
– Sample Standard Deviation
• An estimate of the spread of data within a larger
population.
• Used when you do not have a data value for every
member of the entire population of interest.
• Uses a subset (sample) of the data to generalize
the results to the larger population.
A Note about Standard Deviation
Population
Standard Deviation
σ=
xi − μ
N
Sample
Standard Deviation
2
σ = population standard deviation
xi = individual data value ( x1, x2, x3, …)
μ = population mean
N = size of population
s=
xi − x
n −1
2
s = sample standard deviation
xi = individual data value ( x1, x2, x3, …)
x = sample mean
n = size of sample
Sample Standard Deviation
Variation
xi − x
n −1
Procedure:
s=
1. Calculate the sample mean, x.
2. Subtract the mean from each value and
then square each difference.
3. Sum all squared differences.
4. Divide the summation by the number of
data values minus one, n - 1.
5. Calculate the square root of the result.
2
Sample Mean
Central Tendency
xi
x=
n
x = sample mean
xi = individual data value
xi = summation of all data values
n = # of data values in the sample
Sample Standard Deviation
Estimate the standard deviation for a σ =
population for which the following data
is a sample.
xi − x
n−1
2, 5, 48, 49, 55, 58, 59, 60, 62, 63, 63
xi
524
1. Calculate the sample mean. x =


47.64
n
11
2. Subtract the sample mean from each data value and
2
square the difference.
xi − x
(2 - 47.64)2 = 2083.01
(59 - 47.64)2 = 129.05
(5 - 47.64)2 = 1818.17
(60 - 47.64)2 = 152.77
(48 - 47.64)2 =
0.13
(62 - 47.64)2 = 206.21
(49 - 47.64)2 =
1.85
(63 - 47.64)2 = 235.93
(55 - 47.64)2 = 54.17
(63 - 47.64)2 = 235.93
(58 - 47.64)2 = 107.33
2
Sample Standard Deviation
Variation
3. Sum all squared differences.
2
xi − x = 2083.01 + 1818.17 + 0.13 + 1.85 + 54.17 +
107.33 + 129.05 + 152.77 + 206.21 +
235.93 + 235.93
= 5,024.55
4. Divide the summation by the number of sample data values
minus one.
2
xi − x
5024.55
=
= 502.46
n−1
10
5. Calculate the square root of the result.
xi − x
n−1
2
= 502.46 = 22.4
A Note about Standard Deviation
Population
Standard Deviation
σ=
xi − μ
N
Sample
Standard Deviation
2
σ = population standard deviation
xi = individual data value ( x1, x2, x3, …)
μ = population mean
N = size of population
s=
xi − x
n−1
2
s = sample standard deviation
xi = individual data value ( x1, x2, x3, …)
x = sample mean
n = size of sample
As n → N, s → σ
A Note about Standard Deviation
Population
Standard Deviation
σ=
xi − μ
N
2
σ = population standard deviation
xi = individual data value ( x1, x2, x3, …)
μ = population mean
N = size of population
Sample
Standard Deviation
Given the ACT score of
2 your
every student in
xi − x
s = class, use the
n−1
population standard
deviation formula to find
standard
deviation of
s = the
sample
standard deviation
xi = individual
data scores
value ( x , x , x , …)
ACT
x = sample mean
in the class.
1
n = size of sample
2
3
A Note about Standard Deviation
Population
Standard
Given
the ACTDeviation
scores of
every student in your
2
class, use thexsample
− μ
i
σ=
standard
deviation
N
formula to estimate the
standard deviation of the
σ = population
standard
deviation
ACT
scores of
all students
xi = individual
value ( x , x , x , …)
at yourdata
school.
1
μ = population mean
N = size of population
2
3
Sample
Standard Deviation
s=
xi − x
n−1
2
s = sample standard deviation
xi = individual data value ( x1, x2, x3, …)
x = sample mean
n = size of sample
Histogram
Distribution
• A histogram is a common data distribution
chart that is used to show the frequency
with which specific values, or values within
ranges, occur in a set of data.
• An engineer might use a histogram to
show the variation of a dimension that
exists among a group of parts that are
intended to be identical.
Histogram
Distribution
• Large sets of data are often divided into
limited number of groups. These groups
are called class intervals.
-6 to -16
-5 to 5
Class Intervals
6 to 16
Histogram
Distribution
Frequency
• The number of data elements in each
class interval is shown by the frequency,
which occurs along the Y-axis of the graph
7
5
3
1
-16 to -6
-5 to 5
6 to 16
Histogram
Distribution
Example
1, 7, 15, 4, 8, 8, 5, 12, 10
Frequency
1, 4, 5, 7, 8, 8, 10, 12,15
4
3
2
1
1 to 5
6 to 10
11 to 15
Histogram
Distribution
• The height of each bar in the chart
indicates the number of data elements, or
frequency of occurrence, within each
range
Frequency
1, 4, 5, 7, 8, 8, 10,12,15
4
3
2
1
1 to 5
6 to 10
11 to 15
Histogram
Distribution
Cube Side Length
5
Frequency
4
3
2
1
0
Length (in.)
MINIMUM
= 0.745 in.
Class Intervals
MAXIMUM
= 0.760 in.
Dot Plot
-6
-5
Distribution
0
3
-1
-3
3
2
1
0
-1
0
-1
1
2
-1
1
-2
1
2
1
0
-2
-4
0
0
-4
-3
-2
-1
0
1
2
3
4
5
6
Frequency
Dot Plot
Distribution
0
3
-1
-3
3
2
1
0
-1
0
-1
1
2
-1
1
-2
1
2
1
0
-2
-4
0
0
5
3
1
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
Normal Distribution
Distribution
“Is the data distribution normal?”
• Translation: Is the histogram/dot plot bellshaped?
– Does the greatest frequency of the data
values occur at about the mean value?
– Does the curve decrease on both sides away
from the mean?
– Is the curve symmetric about the mean?
Normal Distribution
Distribution
Frequency
Bell shaped curve
-6
-5
-4
-3
-2
-1
0
1
2
Data Elements
3
4
5
6
Normal Distribution
Distribution
Does the greatest frequency of the
data values occur at about the
mean value?
Frequency
Mean Value
-6
-5
-4
-3
-2
-1
0
1
2
Data Elements
3
4
5
6
Normal Distribution
Distribution
Does the curve decrease
on both sides away from
the mean?
Frequency
Mean Value
-6
-5
-4
-3
-2
-1
0
1
2
Data Elements
3
4
5
6
Normal Distribution
Distribution
Is the curve symmetric
about the mean?
Frequency
Mean Value
-6
-5
-4
-3
-2
-1
0
1
2
Data Elements
3
4
5
6
What if things are not equal?
Histogram Interpretation: Skewed (Non-Normal) Right
Normal Distribution
Distribution
If the data are
normally
distributed:
• 68% of the observations fall within 1 standard deviation of
the mean.
• 95% of the observations fall within 2 standard deviations of
the mean.
• 99.7% of the observations fall within 3 standard deviations of
the mean.
Normal Distribution Example
Data from a
sample of a
larger
population
Mean = x = 0.083
Standard Deviation = s = 1.77 (sample)
Distribution
0.08 + 1.77
= 1.88
0.08 + - 1.77
= -1.69
Normal Distribution
68 %
s
s
-1.77 +1.77
x
0.08
Data Elements
Distribution
0.08 + 3.54
= 3.62
0.08 + -3.54
= - 3.46
Normal Distribution
95 %
2σ
- 3.54
2σ
+ 3.54
x
0.08
Data Elements
Related documents